All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based
@ 2017-07-05 21:08 Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 01/21] blockjob: Track job ratelimits via bytes, not sectors Eric Blake
                   ` (20 more replies)
  0 siblings, 21 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

The overall conversion currently looks like:
part 1: bdrv_is_allocated (this series, v3 was at [1])
part 2: dirty-bitmap (v4 is posted [2]; needs reviews)
part 3: bdrv_get_block_status (v2 is posted [3] and is mostly reviewed)
part 4: upcoming series, for .bdrv_co_block_status (second half of v1 [4])

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-allocated-v4

Depends on Max's block branch (which in turn includes Kevin's)

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06077.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00269.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00427.html
[4] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02642.html

Changes since v3:
new patch 4/21 to avoid a semantic change in 5/21 [Kevin]
new assertion in 8/21 [Kevin]
whitespace rebase churn in vvfat in 19/21

Most patches are now reviewed multiple times, although Kevin has not yet
reviewed the second half of the series.

001/21:[----] [--] 'blockjob: Track job ratelimits via bytes, not sectors'
002/21:[----] [--] 'trace: Show blockjob actions via bytes, not sectors'
003/21:[----] [--] 'stream: Switch stream_populate() to byte-based'
004/21:[down] 'stream: Drop reached_end for stream_complete()'
005/21:[0002] [FC] 'stream: Switch stream_run() to byte-based'
006/21:[----] [--] 'commit: Switch commit_populate() to byte-based'
007/21:[----] [--] 'commit: Switch commit_run() to byte-based'
008/21:[0005] [FC] 'mirror: Switch MirrorBlockJob to byte-based'
009/21:[----] [--] 'mirror: Switch mirror_do_zero_or_discard() to byte-based'
010/21:[----] [--] 'mirror: Update signature of mirror_clip_sectors()'
011/21:[----] [--] 'mirror: Switch mirror_cow_align() to byte-based'
012/21:[----] [--] 'mirror: Switch mirror_do_read() to byte-based'
013/21:[----] [--] 'mirror: Switch mirror_iteration() to byte-based'
014/21:[----] [--] 'block: Drop unused bdrv_round_sectors_to_clusters()'
015/21:[----] [--] 'backup: Switch BackupBlockJob to byte-based'
016/21:[----] [--] 'backup: Switch block_backup.h to byte-based'
017/21:[----] [--] 'backup: Switch backup_do_cow() to byte-based'
018/21:[----] [--] 'backup: Switch backup_run() to byte-based'
019/21:[0004] [FC] 'block: Make bdrv_is_allocated() byte-based'
020/21:[----] [--] 'block: Minimize raw use of bds->total_sectors'
021/21:[----] [--] 'block: Make bdrv_is_allocated_above() byte-based'

Eric Blake (21):
  blockjob: Track job ratelimits via bytes, not sectors
  trace: Show blockjob actions via bytes, not sectors
  stream: Switch stream_populate() to byte-based
  stream: Drop reached_end for stream_complete()
  stream: Switch stream_run() to byte-based
  commit: Switch commit_populate() to byte-based
  commit: Switch commit_run() to byte-based
  mirror: Switch MirrorBlockJob to byte-based
  mirror: Switch mirror_do_zero_or_discard() to byte-based
  mirror: Update signature of mirror_clip_sectors()
  mirror: Switch mirror_cow_align() to byte-based
  mirror: Switch mirror_do_read() to byte-based
  mirror: Switch mirror_iteration() to byte-based
  block: Drop unused bdrv_round_sectors_to_clusters()
  backup: Switch BackupBlockJob to byte-based
  backup: Switch block_backup.h to byte-based
  backup: Switch backup_do_cow() to byte-based
  backup: Switch backup_run() to byte-based
  block: Make bdrv_is_allocated() byte-based
  block: Minimize raw use of bds->total_sectors
  block: Make bdrv_is_allocated_above() byte-based

 include/block/block.h        |  10 +-
 include/block/block_backup.h |  11 +-
 include/qemu/ratelimit.h     |   3 +-
 block/backup.c               | 130 ++++++++----------
 block/commit.c               |  54 ++++----
 block/io.c                   |  92 +++++++------
 block/mirror.c               | 305 ++++++++++++++++++++++---------------------
 block/replication.c          |  29 ++--
 block/stream.c               |  37 +++---
 block/vvfat.c                |  34 +++--
 migration/block.c            |   9 +-
 qemu-img.c                   |  15 ++-
 qemu-io-cmds.c               |  70 +++++-----
 block/trace-events           |  14 +-
 14 files changed, 400 insertions(+), 413 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 01/21] blockjob: Track job ratelimits via bytes, not sectors
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 02/21] trace: Show blockjob actions " Eric Blake
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

The user interface specifies job rate limits in bytes/second.
It's pointless to have our internal representation track things
in sectors/second, particularly since we want to move away from
sector-based interfaces.

Fix up a doc typo found while verifying that the ratelimit
code handles the scaling difference.

Repetition of expressions like 'n * BDRV_SECTOR_SIZE' will be
cleaned up later when functions are converted to iterate over
images by bytes rather than by sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
v2: adjust commit message based on review; no code change
---
 include/qemu/ratelimit.h |  3 ++-
 block/backup.c           |  5 +++--
 block/commit.c           |  5 +++--
 block/mirror.c           | 13 +++++++------
 block/stream.c           |  5 +++--
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
index 8da1232..8dece48 100644
--- a/include/qemu/ratelimit.h
+++ b/include/qemu/ratelimit.h
@@ -24,7 +24,8 @@ typedef struct {

 /** Calculate and return delay for next request in ns
  *
- * Record that we sent @p n data units. If we may send more data units
+ * Record that we sent @n data units (where @n matches the scale chosen
+ * during ratelimit_set_speed). If we may send more data units
  * in the current time slice, return 0 (i.e. no delay). Otherwise
  * return the amount of time (in ns) until the start of the next time
  * slice that will permit sending the next chunk of data.
diff --git a/block/backup.c b/block/backup.c
index 5387fbd..9ca1d8e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -208,7 +208,7 @@ static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
         error_setg(errp, QERR_INVALID_PARAMETER, "speed");
         return;
     }
-    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
 }

 static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
@@ -359,7 +359,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
      */
     if (job->common.speed) {
         uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
-                                                      job->sectors_read);
+                                                      job->sectors_read *
+                                                      BDRV_SECTOR_SIZE);
         job->sectors_read = 0;
         block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
     } else {
diff --git a/block/commit.c b/block/commit.c
index 524bd54..6993994 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -209,7 +209,8 @@ static void coroutine_fn commit_run(void *opaque)
         s->common.offset += n * BDRV_SECTOR_SIZE;

         if (copy && s->common.speed) {
-            delay_ns = ratelimit_calculate_delay(&s->limit, n);
+            delay_ns = ratelimit_calculate_delay(&s->limit,
+                                                 n * BDRV_SECTOR_SIZE);
         }
     }

@@ -231,7 +232,7 @@ static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp)
         error_setg(errp, QERR_INVALID_PARAMETER, "speed");
         return;
     }
-    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
 }

 static const BlockJobDriver commit_job_driver = {
diff --git a/block/mirror.c b/block/mirror.c
index 61a862d..eb27efc 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -396,7 +396,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
     while (nb_chunks > 0 && sector_num < end) {
         int64_t ret;
-        int io_sectors, io_sectors_acct;
+        int io_sectors;
+        int64_t io_bytes_acct;
         BlockDriverState *file;
         enum MirrorMethod {
             MIRROR_METHOD_COPY,
@@ -444,16 +445,16 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         switch (mirror_method) {
         case MIRROR_METHOD_COPY:
             io_sectors = mirror_do_read(s, sector_num, io_sectors);
-            io_sectors_acct = io_sectors;
+            io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
             break;
         case MIRROR_METHOD_ZERO:
         case MIRROR_METHOD_DISCARD:
             mirror_do_zero_or_discard(s, sector_num, io_sectors,
                                       mirror_method == MIRROR_METHOD_DISCARD);
             if (write_zeroes_ok) {
-                io_sectors_acct = 0;
+                io_bytes_acct = 0;
             } else {
-                io_sectors_acct = io_sectors;
+                io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
             }
             break;
         default:
@@ -463,7 +464,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         sector_num += io_sectors;
         nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk);
         if (s->common.speed) {
-            delay_ns = ratelimit_calculate_delay(&s->limit, io_sectors_acct);
+            delay_ns = ratelimit_calculate_delay(&s->limit, io_bytes_acct);
         }
     }
     return delay_ns;
@@ -929,7 +930,7 @@ static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
         error_setg(errp, QERR_INVALID_PARAMETER, "speed");
         return;
     }
-    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
 }

 static void mirror_complete(BlockJob *job, Error **errp)
diff --git a/block/stream.c b/block/stream.c
index 52d329f..29273a5 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -191,7 +191,8 @@ static void coroutine_fn stream_run(void *opaque)
         /* Publish progress */
         s->common.offset += n * BDRV_SECTOR_SIZE;
         if (copy && s->common.speed) {
-            delay_ns = ratelimit_calculate_delay(&s->limit, n);
+            delay_ns = ratelimit_calculate_delay(&s->limit,
+                                                 n * BDRV_SECTOR_SIZE);
         }
     }

@@ -220,7 +221,7 @@ static void stream_set_speed(BlockJob *job, int64_t speed, Error **errp)
         error_setg(errp, QERR_INVALID_PARAMETER, "speed");
         return;
     }
-    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
 }

 static const BlockJobDriver stream_job_driver = {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 02/21] trace: Show blockjob actions via bytes, not sectors
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 01/21] blockjob: Track job ratelimits via bytes, not sectors Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 03/21] stream: Switch stream_populate() to byte-based Eric Blake
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

Upcoming patches are going to switch to byte-based interfaces
instead of sector-based.  Even worse, trace_backup_do_cow_enter()
had a weird mix of cluster and sector indices.

The trace interface is low enough that there are no stability
guarantees, and therefore nothing wrong with changing our units,
even in cases like trace_backup_do_cow_skip() where we are not
changing the trace output.  So make the tracing uniformly use
bytes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
v2: improve commit message, no code change
---
 block/backup.c     | 16 ++++++++++------
 block/commit.c     |  3 ++-
 block/mirror.c     | 26 +++++++++++++++++---------
 block/stream.c     |  3 ++-
 block/trace-events | 14 +++++++-------
 5 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 9ca1d8e..06431ac 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -102,6 +102,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     void *bounce_buffer = NULL;
     int ret = 0;
     int64_t sectors_per_cluster = cluster_size_sectors(job);
+    int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
     int64_t start, end;
     int n;

@@ -110,18 +111,20 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     start = sector_num / sectors_per_cluster;
     end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);

-    trace_backup_do_cow_enter(job, start, sector_num, nb_sectors);
+    trace_backup_do_cow_enter(job, start * bytes_per_cluster,
+                              sector_num * BDRV_SECTOR_SIZE,
+                              nb_sectors * BDRV_SECTOR_SIZE);

     wait_for_overlapping_requests(job, start, end);
     cow_request_begin(&cow_request, job, start, end);

     for (; start < end; start++) {
         if (test_bit(start, job->done_bitmap)) {
-            trace_backup_do_cow_skip(job, start);
+            trace_backup_do_cow_skip(job, start * bytes_per_cluster);
             continue; /* already copied */
         }

-        trace_backup_do_cow_process(job, start);
+        trace_backup_do_cow_process(job, start * bytes_per_cluster);

         n = MIN(sectors_per_cluster,
                 job->common.len / BDRV_SECTOR_SIZE -
@@ -138,7 +141,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                             bounce_qiov.size, &bounce_qiov,
                             is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
         if (ret < 0) {
-            trace_backup_do_cow_read_fail(job, start, ret);
+            trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, ret);
             if (error_is_read) {
                 *error_is_read = true;
             }
@@ -154,7 +157,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                                  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
         }
         if (ret < 0) {
-            trace_backup_do_cow_write_fail(job, start, ret);
+            trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, ret);
             if (error_is_read) {
                 *error_is_read = false;
             }
@@ -177,7 +180,8 @@ out:

     cow_request_end(&cow_request);

-    trace_backup_do_cow_return(job, sector_num, nb_sectors, ret);
+    trace_backup_do_cow_return(job, sector_num * BDRV_SECTOR_SIZE,
+                               nb_sectors * BDRV_SECTOR_SIZE, ret);

     qemu_co_rwlock_unlock(&job->flush_rwlock);

diff --git a/block/commit.c b/block/commit.c
index 6993994..4cda7f2 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -190,7 +190,8 @@ static void coroutine_fn commit_run(void *opaque)
                                       COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
                                       &n);
         copy = (ret == 1);
-        trace_commit_one_iteration(s, sector_num, n, ret);
+        trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
+                                   n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
             ret = commit_populate(s->top, s->base, sector_num, n, buf);
             bytes_written += n * BDRV_SECTOR_SIZE;
diff --git a/block/mirror.c b/block/mirror.c
index eb27efc..b4dfe95 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -103,7 +103,8 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
     int64_t chunk_num;
     int i, nb_chunks, sectors_per_chunk;

-    trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors, ret);
+    trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
+                                op->nb_sectors * BDRV_SECTOR_SIZE, ret);

     s->in_flight--;
     s->sectors_in_flight -= op->nb_sectors;
@@ -268,7 +269,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
     nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);

     while (s->buf_free_count < nb_chunks) {
-        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+        trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
+                                     s->in_flight);
         mirror_wait_for_io(s);
     }

@@ -294,7 +296,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
     /* Copy the dirty cluster.  */
     s->in_flight++;
     s->sectors_in_flight += nb_sectors;
-    trace_mirror_one_iteration(s, sector_num, nb_sectors);
+    trace_mirror_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
+                               nb_sectors * BDRV_SECTOR_SIZE);

     blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, &op->qiov, 0,
                    mirror_read_complete, op);
@@ -347,14 +350,16 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     if (sector_num < 0) {
         bdrv_set_dirty_iter(s->dbi, 0);
         sector_num = bdrv_dirty_iter_next(s->dbi);
-        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
+        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
+                                  BDRV_SECTOR_SIZE);
         assert(sector_num >= 0);
     }
     bdrv_dirty_bitmap_unlock(s->dirty_bitmap);

     first_chunk = sector_num / sectors_per_chunk;
     while (test_bit(first_chunk, s->in_flight_bitmap)) {
-        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+        trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
+                                     s->in_flight);
         mirror_wait_for_io(s);
     }

@@ -433,7 +438,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         }

         while (s->in_flight >= MAX_IN_FLIGHT) {
-            trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+            trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
+                                         s->in_flight);
             mirror_wait_for_io(s);
         }

@@ -823,7 +829,8 @@ static void coroutine_fn mirror_run(void *opaque)
             s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
             if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
                 (cnt == 0 && s->in_flight > 0)) {
-                trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
+                trace_mirror_yield(s, cnt * BDRV_SECTOR_SIZE,
+                                   s->buf_free_count, s->in_flight);
                 mirror_wait_for_io(s);
                 continue;
             } else if (cnt != 0) {
@@ -864,7 +871,7 @@ static void coroutine_fn mirror_run(void *opaque)
              * whether to switch to target check one last time if I/O has
              * come in the meanwhile, and if not flush the data to disk.
              */
-            trace_mirror_before_drain(s, cnt);
+            trace_mirror_before_drain(s, cnt * BDRV_SECTOR_SIZE);

             bdrv_drained_begin(bs);
             cnt = bdrv_get_dirty_count(s->dirty_bitmap);
@@ -883,7 +890,8 @@ static void coroutine_fn mirror_run(void *opaque)
         }

         ret = 0;
-        trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
+        trace_mirror_before_sleep(s, cnt * BDRV_SECTOR_SIZE,
+                                  s->synced, delay_ns);
         if (!s->synced) {
             block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
             if (block_job_is_cancelled(&s->common)) {
diff --git a/block/stream.c b/block/stream.c
index 29273a5..6cb3939 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -168,7 +168,8 @@ static void coroutine_fn stream_run(void *opaque)

             copy = (ret == 1);
         }
-        trace_stream_one_iteration(s, sector_num, n, ret);
+        trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
+                                   n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
             ret = stream_populate(blk, sector_num, n, buf);
         }
diff --git a/block/trace-events b/block/trace-events
index 752de6a..4a4df25 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -15,11 +15,11 @@ bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p off
 bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %u"

 # block/stream.c
-stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
+stream_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
 stream_start(void *bs, void *base, void *s) "bs %p base %p s %p"

 # block/commit.c
-commit_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
+commit_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
 commit_start(void *bs, void *base, void *top, void *s) "bs %p base %p top %p s %p"

 # block/mirror.c
@@ -28,14 +28,14 @@ mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_flush(void *s) "s %p"
 mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_sleep(void *s, int64_t cnt, int synced, uint64_t delay_ns) "s %p dirty count %"PRId64" synced %d delay %"PRIu64"ns"
-mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
-mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors, int ret) "s %p sector_num %"PRId64" nb_sectors %d ret %d"
+mirror_one_iteration(void *s, int64_t offset, uint64_t bytes) "s %p offset %" PRId64 " bytes %" PRIu64
+mirror_iteration_done(void *s, int64_t offset, uint64_t bytes, int ret) "s %p offset %" PRId64 " bytes %" PRIu64 " ret %d"
 mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d"
-mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
+mirror_yield_in_flight(void *s, int64_t offset, int in_flight) "s %p offset %" PRId64 " in_flight %d"

 # block/backup.c
-backup_do_cow_enter(void *job, int64_t start, int64_t sector_num, int nb_sectors) "job %p start %"PRId64" sector_num %"PRId64" nb_sectors %d"
-backup_do_cow_return(void *job, int64_t sector_num, int nb_sectors, int ret) "job %p sector_num %"PRId64" nb_sectors %d ret %d"
+backup_do_cow_enter(void *job, int64_t start, int64_t offset, uint64_t bytes) "job %p start %" PRId64 " offset %" PRId64 " bytes %" PRIu64
+backup_do_cow_return(void *job, int64_t offset, uint64_t bytes, int ret) "job %p offset %" PRId64 " bytes %" PRIu64 " ret %d"
 backup_do_cow_skip(void *job, int64_t start) "job %p start %"PRId64
 backup_do_cow_process(void *job, int64_t start) "job %p start %"PRId64
 backup_do_cow_read_fail(void *job, int64_t start, int ret) "job %p start %"PRId64" ret %d"
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 03/21] stream: Switch stream_populate() to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 01/21] blockjob: Track job ratelimits via bytes, not sectors Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 02/21] trace: Show blockjob actions " Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete() Eric Blake
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Start by converting an
internal function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
v2: no change
---
 block/stream.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 6cb3939..746d525 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -41,20 +41,20 @@ typedef struct StreamBlockJob {
 } StreamBlockJob;

 static int coroutine_fn stream_populate(BlockBackend *blk,
-                                        int64_t sector_num, int nb_sectors,
+                                        int64_t offset, uint64_t bytes,
                                         void *buf)
 {
     struct iovec iov = {
         .iov_base = buf,
-        .iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
+        .iov_len  = bytes,
     };
     QEMUIOVector qiov;

+    assert(bytes < SIZE_MAX);
     qemu_iovec_init_external(&qiov, &iov, 1);

     /* Copy-on-read the unallocated clusters */
-    return blk_co_preadv(blk, sector_num * BDRV_SECTOR_SIZE, qiov.size, &qiov,
-                         BDRV_REQ_COPY_ON_READ);
+    return blk_co_preadv(blk, offset, qiov.size, &qiov, BDRV_REQ_COPY_ON_READ);
 }

 typedef struct {
@@ -171,7 +171,8 @@ static void coroutine_fn stream_run(void *opaque)
         trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
                                    n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
-            ret = stream_populate(blk, sector_num, n, buf);
+            ret = stream_populate(blk, sector_num * BDRV_SECTOR_SIZE,
+                                  n * BDRV_SECTOR_SIZE, buf);
         }
         if (ret < 0) {
             BlockErrorAction action =
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete()
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (2 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 03/21] stream: Switch stream_populate() to byte-based Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06  0:05   ` John Snow
  2017-07-06 10:38   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 05/21] stream: Switch stream_run() to byte-based Eric Blake
                   ` (16 subsequent siblings)
  20 siblings, 2 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

stream_complete() skips the work of rewriting the backing file if
the job was cancelled, if data->reached_end is false, or if there
was an error detected (non-zero data->ret) during the streaming.
But note that in stream_run(), data->reached_end is only set if the
loop ran to completion, and data->ret is only 0 in two cases:
either the loop ran to completion (possibly by cancellation, but
stream_complete checks for that), or we took an early goto out
because there is no bs->backing.  Thus, we can preserve the same
semantics without the use of reached_end, by merely checking for
bs->backing (and logically, if there was no backing file, streaming
is a no-op, so there is no backing file to rewrite).

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: new patch
---
 block/stream.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 746d525..12f1659 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -59,7 +59,6 @@ static int coroutine_fn stream_populate(BlockBackend *blk,

 typedef struct {
     int ret;
-    bool reached_end;
 } StreamCompleteData;

 static void stream_complete(BlockJob *job, void *opaque)
@@ -70,7 +69,7 @@ static void stream_complete(BlockJob *job, void *opaque)
     BlockDriverState *base = s->base;
     Error *local_err = NULL;

-    if (!block_job_is_cancelled(&s->common) && data->reached_end &&
+    if (!block_job_is_cancelled(&s->common) && bs->backing &&
         data->ret == 0) {
         const char *base_id = NULL, *base_fmt = NULL;
         if (base) {
@@ -211,7 +210,6 @@ out:
     /* Modify backing chain and close BDSes in main loop */
     data = g_malloc(sizeof(*data));
     data->ret = ret;
-    data->reached_end = sector_num == end;
     block_job_defer_to_main_loop(&s->common, stream_complete, data);
 }

-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 05/21] stream: Switch stream_run() to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (3 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete() Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 10:39   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 06/21] commit: Switch commit_populate() " Eric Blake
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of streaming to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are sector-aligned).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>

---
v4: avoid reached_end change by rebasing to earlier changes [Kevin], R-b kept
v2-v3: no change
---
 block/stream.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 12f1659..e3dd2ac 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -107,12 +107,11 @@ static void coroutine_fn stream_run(void *opaque)
     BlockBackend *blk = s->common.blk;
     BlockDriverState *bs = blk_bs(blk);
     BlockDriverState *base = s->base;
-    int64_t sector_num = 0;
-    int64_t end = -1;
+    int64_t offset = 0;
     uint64_t delay_ns = 0;
     int error = 0;
     int ret = 0;
-    int n = 0;
+    int n = 0; /* sectors */
     void *buf;

     if (!bs->backing) {
@@ -125,7 +124,6 @@ static void coroutine_fn stream_run(void *opaque)
         goto out;
     }

-    end = s->common.len >> BDRV_SECTOR_BITS;
     buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE);

     /* Turn on copy-on-read for the whole block device so that guest read
@@ -137,7 +135,7 @@ static void coroutine_fn stream_run(void *opaque)
         bdrv_enable_copy_on_read(bs);
     }

-    for (sector_num = 0; sector_num < end; sector_num += n) {
+    for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
         bool copy;

         /* Note that even when no rate limit is applied we need to yield
@@ -150,28 +148,26 @@ static void coroutine_fn stream_run(void *opaque)

         copy = false;

-        ret = bdrv_is_allocated(bs, sector_num,
+        ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
                                 STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
         if (ret == 1) {
             /* Allocated in the top, no need to copy.  */
         } else if (ret >= 0) {
             /* Copy if allocated in the intermediate images.  Limit to the
-             * known-unallocated area [sector_num, sector_num+n).  */
+             * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
             ret = bdrv_is_allocated_above(backing_bs(bs), base,
-                                          sector_num, n, &n);
+                                          offset / BDRV_SECTOR_SIZE, n, &n);

             /* Finish early if end of backing file has been reached */
             if (ret == 0 && n == 0) {
-                n = end - sector_num;
+                n = (s->common.len - offset) / BDRV_SECTOR_SIZE;
             }

             copy = (ret == 1);
         }
-        trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
-                                   n * BDRV_SECTOR_SIZE, ret);
+        trace_stream_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
-            ret = stream_populate(blk, sector_num * BDRV_SECTOR_SIZE,
-                                  n * BDRV_SECTOR_SIZE, buf);
+            ret = stream_populate(blk, offset, n * BDRV_SECTOR_SIZE, buf);
         }
         if (ret < 0) {
             BlockErrorAction action =
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 06/21] commit: Switch commit_populate() to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (4 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 05/21] stream: Switch stream_run() to byte-based Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 07/21] commit: Switch commit_run() " Eric Blake
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Start by converting an
internal function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
v2: no change
---
 block/commit.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 4cda7f2..6f67d78 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -47,26 +47,25 @@ typedef struct CommitBlockJob {
 } CommitBlockJob;

 static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
-                                        int64_t sector_num, int nb_sectors,
+                                        int64_t offset, uint64_t bytes,
                                         void *buf)
 {
     int ret = 0;
     QEMUIOVector qiov;
     struct iovec iov = {
         .iov_base = buf,
-        .iov_len = nb_sectors * BDRV_SECTOR_SIZE,
+        .iov_len = bytes,
     };

+    assert(bytes < SIZE_MAX);
     qemu_iovec_init_external(&qiov, &iov, 1);

-    ret = blk_co_preadv(bs, sector_num * BDRV_SECTOR_SIZE,
-                        qiov.size, &qiov, 0);
+    ret = blk_co_preadv(bs, offset, qiov.size, &qiov, 0);
     if (ret < 0) {
         return ret;
     }

-    ret = blk_co_pwritev(base, sector_num * BDRV_SECTOR_SIZE,
-                         qiov.size, &qiov, 0);
+    ret = blk_co_pwritev(base, offset, qiov.size, &qiov, 0);
     if (ret < 0) {
         return ret;
     }
@@ -193,7 +192,9 @@ static void coroutine_fn commit_run(void *opaque)
         trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
                                    n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
-            ret = commit_populate(s->top, s->base, sector_num, n, buf);
+            ret = commit_populate(s->top, s->base,
+                                  sector_num * BDRV_SECTOR_SIZE,
+                                  n * BDRV_SECTOR_SIZE, buf);
             bytes_written += n * BDRV_SECTOR_SIZE;
         }
         if (ret < 0) {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 07/21] commit: Switch commit_run() to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (5 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 06/21] commit: Switch commit_populate() " Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob " Eric Blake
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of committing to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are sector-aligned).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

---
v2: no change
---
 block/commit.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 6f67d78..c3a7bca 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -143,17 +143,16 @@ static void coroutine_fn commit_run(void *opaque)
 {
     CommitBlockJob *s = opaque;
     CommitCompleteData *data;
-    int64_t sector_num, end;
+    int64_t offset;
     uint64_t delay_ns = 0;
     int ret = 0;
-    int n = 0;
+    int n = 0; /* sectors */
     void *buf = NULL;
     int bytes_written = 0;
     int64_t base_len;

     ret = s->common.len = blk_getlength(s->top);

-
     if (s->common.len < 0) {
         goto out;
     }
@@ -170,10 +169,9 @@ static void coroutine_fn commit_run(void *opaque)
         }
     }

-    end = s->common.len >> BDRV_SECTOR_BITS;
     buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);

-    for (sector_num = 0; sector_num < end; sector_num += n) {
+    for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
         bool copy;

         /* Note that even when no rate limit is applied we need to yield
@@ -185,15 +183,13 @@ static void coroutine_fn commit_run(void *opaque)
         }
         /* Copy if allocated above the base */
         ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
-                                      sector_num,
+                                      offset / BDRV_SECTOR_SIZE,
                                       COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
                                       &n);
         copy = (ret == 1);
-        trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
-                                   n * BDRV_SECTOR_SIZE, ret);
+        trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
-            ret = commit_populate(s->top, s->base,
-                                  sector_num * BDRV_SECTOR_SIZE,
+            ret = commit_populate(s->top, s->base, offset,
                                   n * BDRV_SECTOR_SIZE, buf);
             bytes_written += n * BDRV_SECTOR_SIZE;
         }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (6 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 07/21] commit: Switch commit_run() " Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06  0:14   ` John Snow
  2017-07-06 10:42   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 09/21] mirror: Switch mirror_do_zero_or_discard() " Eric Blake
                   ` (12 subsequent siblings)
  20 siblings, 2 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting an
internal structure (no semantic change), and all references to the
buffer size.

Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS
(necessary for interaction with sector-based dirty bitmaps, until
a later patch converts those to be byte-based) does not suffer from
truncation problems.

[checkpatch has a false positive on use of MIN() in this patch]

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: add assertion and formatting tweak [Kevin], R-b dropped
v2-v3: no change
---
 block/mirror.c | 82 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b4dfe95..9aca0cb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -24,9 +24,8 @@

 #define SLICE_TIME    100000000ULL /* ns */
 #define MAX_IN_FLIGHT 16
-#define MAX_IO_SECTORS ((1 << 20) >> BDRV_SECTOR_BITS) /* 1 Mb */
-#define DEFAULT_MIRROR_BUF_SIZE \
-    (MAX_IN_FLIGHT * MAX_IO_SECTORS * BDRV_SECTOR_SIZE)
+#define MAX_IO_BYTES (1 << 20) /* 1 Mb */
+#define DEFAULT_MIRROR_BUF_SIZE (MAX_IN_FLIGHT * MAX_IO_BYTES)

 /* The mirroring buffer is a list of granularity-sized chunks.
  * Free chunks are organized in a list.
@@ -67,11 +66,11 @@ typedef struct MirrorBlockJob {
     uint64_t last_pause_ns;
     unsigned long *in_flight_bitmap;
     int in_flight;
-    int64_t sectors_in_flight;
+    int64_t bytes_in_flight;
     int ret;
     bool unmap;
     bool waiting_for_io;
-    int target_cluster_sectors;
+    int target_cluster_size;
     int max_iov;
     bool initial_zeroing_ongoing;
 } MirrorBlockJob;
@@ -79,8 +78,8 @@ typedef struct MirrorBlockJob {
 typedef struct MirrorOp {
     MirrorBlockJob *s;
     QEMUIOVector qiov;
-    int64_t sector_num;
-    int nb_sectors;
+    int64_t offset;
+    uint64_t bytes;
 } MirrorOp;

 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
@@ -101,13 +100,12 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
     MirrorBlockJob *s = op->s;
     struct iovec *iov;
     int64_t chunk_num;
-    int i, nb_chunks, sectors_per_chunk;
+    int i, nb_chunks;

-    trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
-                                op->nb_sectors * BDRV_SECTOR_SIZE, ret);
+    trace_mirror_iteration_done(s, op->offset, op->bytes, ret);

     s->in_flight--;
-    s->sectors_in_flight -= op->nb_sectors;
+    s->bytes_in_flight -= op->bytes;
     iov = op->qiov.iov;
     for (i = 0; i < op->qiov.niov; i++) {
         MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
@@ -115,16 +113,15 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
         s->buf_free_count++;
     }

-    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-    chunk_num = op->sector_num / sectors_per_chunk;
-    nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
+    chunk_num = op->offset / s->granularity;
+    nb_chunks = DIV_ROUND_UP(op->bytes, s->granularity);
     bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
     if (ret >= 0) {
         if (s->cow_bitmap) {
             bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
         }
         if (!s->initial_zeroing_ongoing) {
-            s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
+            s->common.offset += op->bytes;
         }
     }
     qemu_iovec_destroy(&op->qiov);
@@ -144,7 +141,8 @@ static void mirror_write_complete(void *opaque, int ret)
     if (ret < 0) {
         BlockErrorAction action;

-        bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
+        bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
+                              op->bytes >> BDRV_SECTOR_BITS);
         action = mirror_error_action(s, false, -ret);
         if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
             s->ret = ret;
@@ -163,7 +161,8 @@ static void mirror_read_complete(void *opaque, int ret)
     if (ret < 0) {
         BlockErrorAction action;

-        bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
+        bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
+                              op->bytes >> BDRV_SECTOR_BITS);
         action = mirror_error_action(s, true, -ret);
         if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
             s->ret = ret;
@@ -171,7 +170,7 @@ static void mirror_read_complete(void *opaque, int ret)

         mirror_iteration_done(op, ret);
     } else {
-        blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, &op->qiov,
+        blk_aio_pwritev(s->target, op->offset, &op->qiov,
                         0, mirror_write_complete, op);
     }
     aio_context_release(blk_get_aio_context(s->common.blk));
@@ -211,7 +210,8 @@ static int mirror_cow_align(MirrorBlockJob *s,
         align_nb_sectors = max_sectors;
         if (need_cow) {
             align_nb_sectors = QEMU_ALIGN_DOWN(align_nb_sectors,
-                                               s->target_cluster_sectors);
+                                               s->target_cluster_size >>
+                                               BDRV_SECTOR_BITS);
         }
     }
     /* Clipping may result in align_nb_sectors unaligned to chunk boundary, but
@@ -277,8 +277,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
     /* Allocate a MirrorOp that is used as an AIO callback.  */
     op = g_new(MirrorOp, 1);
     op->s = s;
-    op->sector_num = sector_num;
-    op->nb_sectors = nb_sectors;
+    op->offset = sector_num * BDRV_SECTOR_SIZE;
+    op->bytes = nb_sectors * BDRV_SECTOR_SIZE;

     /* Now make a QEMUIOVector taking enough granularity-sized chunks
      * from s->buf_free.
@@ -295,7 +295,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,

     /* Copy the dirty cluster.  */
     s->in_flight++;
-    s->sectors_in_flight += nb_sectors;
+    s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
     trace_mirror_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
                                nb_sectors * BDRV_SECTOR_SIZE);

@@ -315,19 +315,17 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
      * so the freeing in mirror_iteration_done is nop. */
     op = g_new0(MirrorOp, 1);
     op->s = s;
-    op->sector_num = sector_num;
-    op->nb_sectors = nb_sectors;
+    op->offset = sector_num * BDRV_SECTOR_SIZE;
+    op->bytes = nb_sectors * BDRV_SECTOR_SIZE;

     s->in_flight++;
-    s->sectors_in_flight += nb_sectors;
+    s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
     if (is_discard) {
         blk_aio_pdiscard(s->target, sector_num << BDRV_SECTOR_BITS,
-                         op->nb_sectors << BDRV_SECTOR_BITS,
-                         mirror_write_complete, op);
+                         op->bytes, mirror_write_complete, op);
     } else {
         blk_aio_pwrite_zeroes(s->target, sector_num * BDRV_SECTOR_SIZE,
-                              op->nb_sectors * BDRV_SECTOR_SIZE,
-                              s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
+                              op->bytes, s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
                               mirror_write_complete, op);
     }
 }
@@ -342,8 +340,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     int64_t end = s->bdev_length / BDRV_SECTOR_SIZE;
     int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
     bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
-    int max_io_sectors = MAX((s->buf_size >> BDRV_SECTOR_BITS) / MAX_IN_FLIGHT,
-                             MAX_IO_SECTORS);
+    int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);

     bdrv_dirty_bitmap_lock(s->dirty_bitmap);
     sector_num = bdrv_dirty_iter_next(s->dbi);
@@ -415,9 +412,10 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
                                           nb_chunks * sectors_per_chunk,
                                           &io_sectors, &file);
         if (ret < 0) {
-            io_sectors = MIN(nb_chunks * sectors_per_chunk, max_io_sectors);
+            io_sectors = MIN(nb_chunks * sectors_per_chunk,
+                             max_io_bytes >> BDRV_SECTOR_BITS);
         } else if (ret & BDRV_BLOCK_DATA) {
-            io_sectors = MIN(io_sectors, max_io_sectors);
+            io_sectors = MIN(io_sectors, max_io_bytes >> BDRV_SECTOR_BITS);
         }

         io_sectors -= io_sectors % sectors_per_chunk;
@@ -719,7 +717,6 @@ static void coroutine_fn mirror_run(void *opaque)
     char backing_filename[2]; /* we only need 2 characters because we are only
                                  checking for a NULL string */
     int ret = 0;
-    int target_cluster_size = BDRV_SECTOR_SIZE;

     if (block_job_is_cancelled(&s->common)) {
         goto immediate_exit;
@@ -771,14 +768,15 @@ static void coroutine_fn mirror_run(void *opaque)
     bdrv_get_backing_filename(target_bs, backing_filename,
                               sizeof(backing_filename));
     if (!bdrv_get_info(target_bs, &bdi) && bdi.cluster_size) {
-        target_cluster_size = bdi.cluster_size;
+        s->target_cluster_size = bdi.cluster_size;
+    } else {
+        s->target_cluster_size = BDRV_SECTOR_SIZE;
     }
     if (backing_filename[0] && !target_bs->backing
-        && s->granularity < target_cluster_size) {
-        s->buf_size = MAX(s->buf_size, target_cluster_size);
+        && s->granularity < s->target_cluster_size) {
+        s->buf_size = MAX(s->buf_size, s->target_cluster_size);
         s->cow_bitmap = bitmap_new(length);
     }
-    s->target_cluster_sectors = target_cluster_size >> BDRV_SECTOR_BITS;
     s->max_iov = MIN(bs->bl.max_iov, target_bs->bl.max_iov);

     s->buf = qemu_try_blockalign(bs, s->buf_size);
@@ -814,10 +812,10 @@ static void coroutine_fn mirror_run(void *opaque)
         cnt = bdrv_get_dirty_count(s->dirty_bitmap);
         /* s->common.offset contains the number of bytes already processed so
          * far, cnt is the number of dirty sectors remaining and
-         * s->sectors_in_flight is the number of sectors currently being
+         * s->bytes_in_flight is the number of bytes currently being
          * processed; together those are the current total operation length */
-        s->common.len = s->common.offset +
-                        (cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE;
+        s->common.len = s->common.offset + s->bytes_in_flight +
+            cnt * BDRV_SECTOR_SIZE;

         /* Note that even when no rate limit is applied we need to yield
          * periodically with no pending I/O so that bdrv_drain_all() returns.
@@ -1150,6 +1148,8 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
     }

     assert ((granularity & (granularity - 1)) == 0);
+    /* Granularity must be large enough for sector-based dirty bitmap */
+    assert(granularity >= BDRV_SECTOR_SIZE);

     if (buf_size < 0) {
         error_setg(errp, "Invalid parameter 'buf-size'");
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 09/21] mirror: Switch mirror_do_zero_or_discard() to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (7 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob " Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 10/21] mirror: Update signature of mirror_clip_sectors() Eric Blake
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

---
v2-v4: no change
---
 block/mirror.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9aca0cb..2736d0b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -305,8 +305,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
 }

 static void mirror_do_zero_or_discard(MirrorBlockJob *s,
-                                      int64_t sector_num,
-                                      int nb_sectors,
+                                      int64_t offset,
+                                      uint64_t bytes,
                                       bool is_discard)
 {
     MirrorOp *op;
@@ -315,16 +315,16 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
      * so the freeing in mirror_iteration_done is nop. */
     op = g_new0(MirrorOp, 1);
     op->s = s;
-    op->offset = sector_num * BDRV_SECTOR_SIZE;
-    op->bytes = nb_sectors * BDRV_SECTOR_SIZE;
+    op->offset = offset;
+    op->bytes = bytes;

     s->in_flight++;
-    s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
+    s->bytes_in_flight += bytes;
     if (is_discard) {
-        blk_aio_pdiscard(s->target, sector_num << BDRV_SECTOR_BITS,
+        blk_aio_pdiscard(s->target, offset,
                          op->bytes, mirror_write_complete, op);
     } else {
-        blk_aio_pwrite_zeroes(s->target, sector_num * BDRV_SECTOR_SIZE,
+        blk_aio_pwrite_zeroes(s->target, offset,
                               op->bytes, s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
                               mirror_write_complete, op);
     }
@@ -453,7 +453,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             break;
         case MIRROR_METHOD_ZERO:
         case MIRROR_METHOD_DISCARD:
-            mirror_do_zero_or_discard(s, sector_num, io_sectors,
+            mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
+                                      io_sectors * BDRV_SECTOR_SIZE,
                                       mirror_method == MIRROR_METHOD_DISCARD);
             if (write_zeroes_ok) {
                 io_bytes_acct = 0;
@@ -657,7 +658,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
                 continue;
             }

-            mirror_do_zero_or_discard(s, sector_num, nb_sectors, false);
+            mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
+                                      nb_sectors * BDRV_SECTOR_SIZE, false);
             sector_num += nb_sectors;
         }

-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 10/21] mirror: Update signature of mirror_clip_sectors()
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (8 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 09/21] mirror: Switch mirror_do_zero_or_discard() " Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 11/21] mirror: Switch mirror_cow_align() to byte-based Eric Blake
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

Rather than having a void function that modifies its input
in-place as the output, change the signature to reduce a layer
of indirection and return the result.

Suggested-by: John Snow <jsnow@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

---
v3-v4: no change
v2: new patch
---
 block/mirror.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 2736d0b..70682b6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -176,12 +176,12 @@ static void mirror_read_complete(void *opaque, int ret)
     aio_context_release(blk_get_aio_context(s->common.blk));
 }

-static inline void mirror_clip_sectors(MirrorBlockJob *s,
-                                       int64_t sector_num,
-                                       int *nb_sectors)
+static inline int mirror_clip_sectors(MirrorBlockJob *s,
+                                      int64_t sector_num,
+                                      int nb_sectors)
 {
-    *nb_sectors = MIN(*nb_sectors,
-                      s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
+    return MIN(nb_sectors,
+               s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
 }

 /* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
@@ -216,7 +216,8 @@ static int mirror_cow_align(MirrorBlockJob *s,
     }
     /* Clipping may result in align_nb_sectors unaligned to chunk boundary, but
      * that doesn't matter because it's already the end of source image. */
-    mirror_clip_sectors(s, align_sector_num, &align_nb_sectors);
+    align_nb_sectors = mirror_clip_sectors(s, align_sector_num,
+                                           align_nb_sectors);

     ret = align_sector_num + align_nb_sectors - (*sector_num + *nb_sectors);
     *sector_num = align_sector_num;
@@ -445,7 +446,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             return 0;
         }

-        mirror_clip_sectors(s, sector_num, &io_sectors);
+        io_sectors = mirror_clip_sectors(s, sector_num, io_sectors);
         switch (mirror_method) {
         case MIRROR_METHOD_COPY:
             io_sectors = mirror_do_read(s, sector_num, io_sectors);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 11/21] mirror: Switch mirror_cow_align() to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (9 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 10/21] mirror: Update signature of mirror_clip_sectors() Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 11:16   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 12/21] mirror: Switch mirror_do_read() " Eric Blake
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and add mirror_clip_bytes() as a
counterpart to mirror_clip_sectors().  Some of the conversion is
a bit tricky, requiring temporaries to convert between units; it
will be cleared up in a following patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>

---
v3-v4: no change
v2: tweak mirror_clip_bytes() signature to match previous patch
---
 block/mirror.c | 64 ++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 70682b6..374cefd 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -176,6 +176,15 @@ static void mirror_read_complete(void *opaque, int ret)
     aio_context_release(blk_get_aio_context(s->common.blk));
 }

+/* Clip bytes relative to offset to not exceed end-of-file */
+static inline int64_t mirror_clip_bytes(MirrorBlockJob *s,
+                                        int64_t offset,
+                                        int64_t bytes)
+{
+    return MIN(bytes, s->bdev_length - offset);
+}
+
+/* Clip nb_sectors relative to sector_num to not exceed end-of-file */
 static inline int mirror_clip_sectors(MirrorBlockJob *s,
                                       int64_t sector_num,
                                       int nb_sectors)
@@ -184,44 +193,39 @@ static inline int mirror_clip_sectors(MirrorBlockJob *s,
                s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
 }

-/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
- * return the offset of the adjusted tail sector against original. */
-static int mirror_cow_align(MirrorBlockJob *s,
-                            int64_t *sector_num,
-                            int *nb_sectors)
+/* Round offset and/or bytes to target cluster if COW is needed, and
+ * return the offset of the adjusted tail against original. */
+static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
+                            unsigned int *bytes)
 {
     bool need_cow;
     int ret = 0;
-    int chunk_sectors = s->granularity >> BDRV_SECTOR_BITS;
-    int64_t align_sector_num = *sector_num;
-    int align_nb_sectors = *nb_sectors;
-    int max_sectors = chunk_sectors * s->max_iov;
+    int64_t align_offset = *offset;
+    unsigned int align_bytes = *bytes;
+    int max_bytes = s->granularity * s->max_iov;

-    need_cow = !test_bit(*sector_num / chunk_sectors, s->cow_bitmap);
-    need_cow |= !test_bit((*sector_num + *nb_sectors - 1) / chunk_sectors,
+    need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
+    need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
                           s->cow_bitmap);
     if (need_cow) {
-        bdrv_round_sectors_to_clusters(blk_bs(s->target), *sector_num,
-                                       *nb_sectors, &align_sector_num,
-                                       &align_nb_sectors);
+        bdrv_round_to_clusters(blk_bs(s->target), *offset, *bytes,
+                               &align_offset, &align_bytes);
     }

-    if (align_nb_sectors > max_sectors) {
-        align_nb_sectors = max_sectors;
+    if (align_bytes > max_bytes) {
+        align_bytes = max_bytes;
         if (need_cow) {
-            align_nb_sectors = QEMU_ALIGN_DOWN(align_nb_sectors,
-                                               s->target_cluster_size >>
-                                               BDRV_SECTOR_BITS);
+            align_bytes = QEMU_ALIGN_DOWN(align_bytes,
+                                          s->target_cluster_size);
         }
     }
-    /* Clipping may result in align_nb_sectors unaligned to chunk boundary, but
+    /* Clipping may result in align_bytes unaligned to chunk boundary, but
      * that doesn't matter because it's already the end of source image. */
-    align_nb_sectors = mirror_clip_sectors(s, align_sector_num,
-                                           align_nb_sectors);
+    align_bytes = mirror_clip_bytes(s, align_offset, align_bytes);

-    ret = align_sector_num + align_nb_sectors - (*sector_num + *nb_sectors);
-    *sector_num = align_sector_num;
-    *nb_sectors = align_nb_sectors;
+    ret = align_offset + align_bytes - (*offset + *bytes);
+    *offset = align_offset;
+    *bytes = align_bytes;
     assert(ret >= 0);
     return ret;
 }
@@ -257,10 +261,18 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
     nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
     nb_sectors = MIN(max_sectors, nb_sectors);
     assert(nb_sectors);
+    assert(nb_sectors < BDRV_REQUEST_MAX_SECTORS);
     ret = nb_sectors;

     if (s->cow_bitmap) {
-        ret += mirror_cow_align(s, &sector_num, &nb_sectors);
+        int64_t offset = sector_num * BDRV_SECTOR_SIZE;
+        unsigned int bytes = nb_sectors * BDRV_SECTOR_SIZE;
+        int gap;
+
+        gap = mirror_cow_align(s, &offset, &bytes);
+        sector_num = offset / BDRV_SECTOR_SIZE;
+        nb_sectors = bytes / BDRV_SECTOR_SIZE;
+        ret += gap / BDRV_SECTOR_SIZE;
     }
     assert(nb_sectors << BDRV_SECTOR_BITS <= s->buf_size);
     /* The sector range must meet granularity because:
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 12/21] mirror: Switch mirror_do_read() to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (10 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 11/21] mirror: Switch mirror_cow_align() to byte-based Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 13:30   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 13/21] mirror: Switch mirror_iteration() " Eric Blake
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>

---
v3-v4: no change
v2: rebase to earlier changes
---
 block/mirror.c | 75 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 33 insertions(+), 42 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 374cefd..d3325d0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -196,7 +196,7 @@ static inline int mirror_clip_sectors(MirrorBlockJob *s,
 /* Round offset and/or bytes to target cluster if COW is needed, and
  * return the offset of the adjusted tail against original. */
 static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
-                            unsigned int *bytes)
+                            uint64_t *bytes)
 {
     bool need_cow;
     int ret = 0;
@@ -204,6 +204,7 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
     unsigned int align_bytes = *bytes;
     int max_bytes = s->granularity * s->max_iov;

+    assert(*bytes < INT_MAX);
     need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
     need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
                           s->cow_bitmap);
@@ -239,59 +240,50 @@ static inline void mirror_wait_for_io(MirrorBlockJob *s)
 }

 /* Submit async read while handling COW.
- * Returns: The number of sectors copied after and including sector_num,
- *          excluding any sectors copied prior to sector_num due to alignment.
- *          This will be nb_sectors if no alignment is necessary, or
- *          (new_end - sector_num) if tail is rounded up or down due to
+ * Returns: The number of bytes copied after and including offset,
+ *          excluding any bytes copied prior to offset due to alignment.
+ *          This will be @bytes if no alignment is necessary, or
+ *          (new_end - offset) if tail is rounded up or down due to
  *          alignment or buffer limit.
  */
-static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
-                          int nb_sectors)
+static uint64_t mirror_do_read(MirrorBlockJob *s, int64_t offset,
+                               uint64_t bytes)
 {
     BlockBackend *source = s->common.blk;
-    int sectors_per_chunk, nb_chunks;
-    int ret;
+    int nb_chunks;
+    uint64_t ret;
     MirrorOp *op;
-    int max_sectors;
+    uint64_t max_bytes;

-    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-    max_sectors = sectors_per_chunk * s->max_iov;
+    max_bytes = s->granularity * s->max_iov;

     /* We can only handle as much as buf_size at a time. */
-    nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
-    nb_sectors = MIN(max_sectors, nb_sectors);
-    assert(nb_sectors);
-    assert(nb_sectors < BDRV_REQUEST_MAX_SECTORS);
-    ret = nb_sectors;
+    bytes = MIN(s->buf_size, MIN(max_bytes, bytes));
+    assert(bytes);
+    assert(bytes < BDRV_REQUEST_MAX_BYTES);
+    ret = bytes;

     if (s->cow_bitmap) {
-        int64_t offset = sector_num * BDRV_SECTOR_SIZE;
-        unsigned int bytes = nb_sectors * BDRV_SECTOR_SIZE;
-        int gap;
-
-        gap = mirror_cow_align(s, &offset, &bytes);
-        sector_num = offset / BDRV_SECTOR_SIZE;
-        nb_sectors = bytes / BDRV_SECTOR_SIZE;
-        ret += gap / BDRV_SECTOR_SIZE;
+        ret += mirror_cow_align(s, &offset, &bytes);
     }
-    assert(nb_sectors << BDRV_SECTOR_BITS <= s->buf_size);
-    /* The sector range must meet granularity because:
+    assert(bytes <= s->buf_size);
+    /* The range will be sector-aligned because:
      * 1) Caller passes in aligned values;
-     * 2) mirror_cow_align is used only when target cluster is larger. */
-    assert(!(sector_num % sectors_per_chunk));
-    nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
+     * 2) mirror_cow_align is used only when target cluster is larger.
+     * But it might not be cluster-aligned at end-of-file. */
+    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
+    nb_chunks = DIV_ROUND_UP(bytes, s->granularity);

     while (s->buf_free_count < nb_chunks) {
-        trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
-                                     s->in_flight);
+        trace_mirror_yield_in_flight(s, offset, s->in_flight);
         mirror_wait_for_io(s);
     }

     /* Allocate a MirrorOp that is used as an AIO callback.  */
     op = g_new(MirrorOp, 1);
     op->s = s;
-    op->offset = sector_num * BDRV_SECTOR_SIZE;
-    op->bytes = nb_sectors * BDRV_SECTOR_SIZE;
+    op->offset = offset;
+    op->bytes = bytes;

     /* Now make a QEMUIOVector taking enough granularity-sized chunks
      * from s->buf_free.
@@ -299,7 +291,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
     qemu_iovec_init(&op->qiov, nb_chunks);
     while (nb_chunks-- > 0) {
         MirrorBuffer *buf = QSIMPLEQ_FIRST(&s->buf_free);
-        size_t remaining = nb_sectors * BDRV_SECTOR_SIZE - op->qiov.size;
+        size_t remaining = bytes - op->qiov.size;

         QSIMPLEQ_REMOVE_HEAD(&s->buf_free, next);
         s->buf_free_count--;
@@ -308,12 +300,10 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,

     /* Copy the dirty cluster.  */
     s->in_flight++;
-    s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
-    trace_mirror_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
-                               nb_sectors * BDRV_SECTOR_SIZE);
+    s->bytes_in_flight += bytes;
+    trace_mirror_one_iteration(s, offset, bytes);

-    blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, &op->qiov, 0,
-                   mirror_read_complete, op);
+    blk_aio_preadv(source, offset, &op->qiov, 0, mirror_read_complete, op);
     return ret;
 }

@@ -461,8 +451,9 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         io_sectors = mirror_clip_sectors(s, sector_num, io_sectors);
         switch (mirror_method) {
         case MIRROR_METHOD_COPY:
-            io_sectors = mirror_do_read(s, sector_num, io_sectors);
-            io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
+            io_bytes_acct = mirror_do_read(s, sector_num * BDRV_SECTOR_SIZE,
+                                           io_sectors * BDRV_SECTOR_SIZE);
+            io_sectors = io_bytes_acct / BDRV_SECTOR_SIZE;
             break;
         case MIRROR_METHOD_ZERO:
         case MIRROR_METHOD_DISCARD:
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 13/21] mirror: Switch mirror_iteration() to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (11 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 12/21] mirror: Switch mirror_do_read() " Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 13:47   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 14/21] block: Drop unused bdrv_round_sectors_to_clusters() Eric Blake
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of mirroring to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are both sector-aligned and multiples of the granularity).  Drop
the now-unused mirror_clip_sectors().

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
v4: no change
v3: rebase to Paolo's thread-safety changes, R-b kept
v2: straightforward rebase to earlier mirror_clip_bytes() change, R-b kept
---
 block/mirror.c | 105 +++++++++++++++++++++++++--------------------------------
 1 file changed, 46 insertions(+), 59 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index d3325d0..f54a8d7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -184,15 +184,6 @@ static inline int64_t mirror_clip_bytes(MirrorBlockJob *s,
     return MIN(bytes, s->bdev_length - offset);
 }

-/* Clip nb_sectors relative to sector_num to not exceed end-of-file */
-static inline int mirror_clip_sectors(MirrorBlockJob *s,
-                                      int64_t sector_num,
-                                      int nb_sectors)
-{
-    return MIN(nb_sectors,
-               s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
-}
-
 /* Round offset and/or bytes to target cluster if COW is needed, and
  * return the offset of the adjusted tail against original. */
 static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
@@ -336,30 +327,28 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
     BlockDriverState *source = s->source;
-    int64_t sector_num, first_chunk;
+    int64_t offset, first_chunk;
     uint64_t delay_ns = 0;
     /* At least the first dirty chunk is mirrored in one iteration. */
     int nb_chunks = 1;
-    int64_t end = s->bdev_length / BDRV_SECTOR_SIZE;
     int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
     bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
     int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);

     bdrv_dirty_bitmap_lock(s->dirty_bitmap);
-    sector_num = bdrv_dirty_iter_next(s->dbi);
-    if (sector_num < 0) {
+    offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+    if (offset < 0) {
         bdrv_set_dirty_iter(s->dbi, 0);
-        sector_num = bdrv_dirty_iter_next(s->dbi);
+        offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
         trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
                                   BDRV_SECTOR_SIZE);
-        assert(sector_num >= 0);
+        assert(offset >= 0);
     }
     bdrv_dirty_bitmap_unlock(s->dirty_bitmap);

-    first_chunk = sector_num / sectors_per_chunk;
+    first_chunk = offset / s->granularity;
     while (test_bit(first_chunk, s->in_flight_bitmap)) {
-        trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
-                                     s->in_flight);
+        trace_mirror_yield_in_flight(s, offset, s->in_flight);
         mirror_wait_for_io(s);
     }

@@ -368,25 +357,26 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     /* Find the number of consective dirty chunks following the first dirty
      * one, and wait for in flight requests in them. */
     bdrv_dirty_bitmap_lock(s->dirty_bitmap);
-    while (nb_chunks * sectors_per_chunk < (s->buf_size >> BDRV_SECTOR_BITS)) {
+    while (nb_chunks * s->granularity < s->buf_size) {
         int64_t next_dirty;
-        int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk;
-        int64_t next_chunk = next_sector / sectors_per_chunk;
-        if (next_sector >= end ||
-            !bdrv_get_dirty_locked(source, s->dirty_bitmap, next_sector)) {
+        int64_t next_offset = offset + nb_chunks * s->granularity;
+        int64_t next_chunk = next_offset / s->granularity;
+        if (next_offset >= s->bdev_length ||
+            !bdrv_get_dirty_locked(source, s->dirty_bitmap,
+                                   next_offset >> BDRV_SECTOR_BITS)) {
             break;
         }
         if (test_bit(next_chunk, s->in_flight_bitmap)) {
             break;
         }

-        next_dirty = bdrv_dirty_iter_next(s->dbi);
-        if (next_dirty > next_sector || next_dirty < 0) {
+        next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+        if (next_dirty > next_offset || next_dirty < 0) {
             /* The bitmap iterator's cache is stale, refresh it */
-            bdrv_set_dirty_iter(s->dbi, next_sector);
-            next_dirty = bdrv_dirty_iter_next(s->dbi);
+            bdrv_set_dirty_iter(s->dbi, next_offset >> BDRV_SECTOR_BITS);
+            next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
         }
-        assert(next_dirty == next_sector);
+        assert(next_dirty == next_offset);
         nb_chunks++;
     }

@@ -394,14 +384,15 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
      * calling bdrv_get_block_status_above could yield - if some blocks are
      * marked dirty in this window, we need to know.
      */
-    bdrv_reset_dirty_bitmap_locked(s->dirty_bitmap, sector_num,
-                                  nb_chunks * sectors_per_chunk);
+    bdrv_reset_dirty_bitmap_locked(s->dirty_bitmap, offset >> BDRV_SECTOR_BITS,
+                                   nb_chunks * sectors_per_chunk);
     bdrv_dirty_bitmap_unlock(s->dirty_bitmap);

-    bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
-    while (nb_chunks > 0 && sector_num < end) {
+    bitmap_set(s->in_flight_bitmap, offset / s->granularity, nb_chunks);
+    while (nb_chunks > 0 && offset < s->bdev_length) {
         int64_t ret;
         int io_sectors;
+        unsigned int io_bytes;
         int64_t io_bytes_acct;
         BlockDriverState *file;
         enum MirrorMethod {
@@ -410,28 +401,28 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             MIRROR_METHOD_DISCARD
         } mirror_method = MIRROR_METHOD_COPY;

-        assert(!(sector_num % sectors_per_chunk));
-        ret = bdrv_get_block_status_above(source, NULL, sector_num,
+        assert(!(offset % s->granularity));
+        ret = bdrv_get_block_status_above(source, NULL,
+                                          offset >> BDRV_SECTOR_BITS,
                                           nb_chunks * sectors_per_chunk,
                                           &io_sectors, &file);
+        io_bytes = io_sectors * BDRV_SECTOR_SIZE;
         if (ret < 0) {
-            io_sectors = MIN(nb_chunks * sectors_per_chunk,
-                             max_io_bytes >> BDRV_SECTOR_BITS);
+            io_bytes = MIN(nb_chunks * s->granularity, max_io_bytes);
         } else if (ret & BDRV_BLOCK_DATA) {
-            io_sectors = MIN(io_sectors, max_io_bytes >> BDRV_SECTOR_BITS);
+            io_bytes = MIN(io_bytes, max_io_bytes);
         }

-        io_sectors -= io_sectors % sectors_per_chunk;
-        if (io_sectors < sectors_per_chunk) {
-            io_sectors = sectors_per_chunk;
+        io_bytes -= io_bytes % s->granularity;
+        if (io_bytes < s->granularity) {
+            io_bytes = s->granularity;
         } else if (ret >= 0 && !(ret & BDRV_BLOCK_DATA)) {
-            int64_t target_sector_num;
-            int target_nb_sectors;
-            bdrv_round_sectors_to_clusters(blk_bs(s->target), sector_num,
-                                           io_sectors,  &target_sector_num,
-                                           &target_nb_sectors);
-            if (target_sector_num == sector_num &&
-                target_nb_sectors == io_sectors) {
+            int64_t target_offset;
+            unsigned int target_bytes;
+            bdrv_round_to_clusters(blk_bs(s->target), offset, io_bytes,
+                                   &target_offset, &target_bytes);
+            if (target_offset == offset &&
+                target_bytes == io_bytes) {
                 mirror_method = ret & BDRV_BLOCK_ZERO ?
                                     MIRROR_METHOD_ZERO :
                                     MIRROR_METHOD_DISCARD;
@@ -439,8 +430,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         }

         while (s->in_flight >= MAX_IN_FLIGHT) {
-            trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
-                                         s->in_flight);
+            trace_mirror_yield_in_flight(s, offset, s->in_flight);
             mirror_wait_for_io(s);
         }

@@ -448,30 +438,27 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             return 0;
         }

-        io_sectors = mirror_clip_sectors(s, sector_num, io_sectors);
+        io_bytes = mirror_clip_bytes(s, offset, io_bytes);
         switch (mirror_method) {
         case MIRROR_METHOD_COPY:
-            io_bytes_acct = mirror_do_read(s, sector_num * BDRV_SECTOR_SIZE,
-                                           io_sectors * BDRV_SECTOR_SIZE);
-            io_sectors = io_bytes_acct / BDRV_SECTOR_SIZE;
+            io_bytes = io_bytes_acct = mirror_do_read(s, offset, io_bytes);
             break;
         case MIRROR_METHOD_ZERO:
         case MIRROR_METHOD_DISCARD:
-            mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
-                                      io_sectors * BDRV_SECTOR_SIZE,
+            mirror_do_zero_or_discard(s, offset, io_bytes,
                                       mirror_method == MIRROR_METHOD_DISCARD);
             if (write_zeroes_ok) {
                 io_bytes_acct = 0;
             } else {
-                io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
+                io_bytes_acct = io_bytes;
             }
             break;
         default:
             abort();
         }
-        assert(io_sectors);
-        sector_num += io_sectors;
-        nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk);
+        assert(io_bytes);
+        offset += io_bytes;
+        nb_chunks -= DIV_ROUND_UP(io_bytes, s->granularity);
         if (s->common.speed) {
             delay_ns = ratelimit_calculate_delay(&s->limit, io_bytes_acct);
         }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 14/21] block: Drop unused bdrv_round_sectors_to_clusters()
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (12 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 13/21] mirror: Switch mirror_iteration() " Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 13:49   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 15/21] backup: Switch BackupBlockJob to byte-based Eric Blake
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, jcody, qemu-block, Stefan Hajnoczi, Fam Zheng, Max Reitz

Now that the last user [mirror_iteration()] has converted to using
bytes, we no longer need a function to round sectors to clusters.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
v3-v4: no change
v2: hoist to earlier series, no change
---
 include/block/block.h |  4 ----
 block/io.c            | 21 ---------------------
 2 files changed, 25 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index f432ea6..a9dc753 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -472,10 +472,6 @@ const char *bdrv_get_device_or_node_name(const BlockDriverState *bs);
 int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
-void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
-                                    int64_t sector_num, int nb_sectors,
-                                    int64_t *cluster_sector_num,
-                                    int *cluster_nb_sectors);
 void bdrv_round_to_clusters(BlockDriverState *bs,
                             int64_t offset, unsigned int bytes,
                             int64_t *cluster_offset,
diff --git a/block/io.c b/block/io.c
index 9186c5a..f662c97 100644
--- a/block/io.c
+++ b/block/io.c
@@ -419,27 +419,6 @@ static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
 }

 /**
- * Round a region to cluster boundaries (sector-based)
- */
-void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
-                                    int64_t sector_num, int nb_sectors,
-                                    int64_t *cluster_sector_num,
-                                    int *cluster_nb_sectors)
-{
-    BlockDriverInfo bdi;
-
-    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
-        *cluster_sector_num = sector_num;
-        *cluster_nb_sectors = nb_sectors;
-    } else {
-        int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
-        *cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
-        *cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
-                                            nb_sectors, c);
-    }
-}
-
-/**
  * Round a region to cluster boundaries
  */
 void bdrv_round_to_clusters(BlockDriverState *bs,
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 15/21] backup: Switch BackupBlockJob to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (13 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 14/21] block: Drop unused bdrv_round_sectors_to_clusters() Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 13:59   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 16/21] backup: Switch block_backup.h " Eric Blake
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting an
internal structure (no semantic change), and all references to
tracking progress.  Drop a redundant local variable bytes_per_cluster.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
v2-v4: no change
---
 block/backup.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 06431ac..4e64710 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -39,7 +39,7 @@ typedef struct BackupBlockJob {
     BlockdevOnError on_source_error;
     BlockdevOnError on_target_error;
     CoRwlock flush_rwlock;
-    uint64_t sectors_read;
+    uint64_t bytes_read;
     unsigned long *done_bitmap;
     int64_t cluster_size;
     bool compress;
@@ -102,16 +102,15 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     void *bounce_buffer = NULL;
     int ret = 0;
     int64_t sectors_per_cluster = cluster_size_sectors(job);
-    int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
-    int64_t start, end;
-    int n;
+    int64_t start, end; /* clusters */
+    int n; /* bytes */

     qemu_co_rwlock_rdlock(&job->flush_rwlock);

     start = sector_num / sectors_per_cluster;
     end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);

-    trace_backup_do_cow_enter(job, start * bytes_per_cluster,
+    trace_backup_do_cow_enter(job, start * job->cluster_size,
                               sector_num * BDRV_SECTOR_SIZE,
                               nb_sectors * BDRV_SECTOR_SIZE);

@@ -120,28 +119,27 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,

     for (; start < end; start++) {
         if (test_bit(start, job->done_bitmap)) {
-            trace_backup_do_cow_skip(job, start * bytes_per_cluster);
+            trace_backup_do_cow_skip(job, start * job->cluster_size);
             continue; /* already copied */
         }

-        trace_backup_do_cow_process(job, start * bytes_per_cluster);
+        trace_backup_do_cow_process(job, start * job->cluster_size);

-        n = MIN(sectors_per_cluster,
-                job->common.len / BDRV_SECTOR_SIZE -
-                start * sectors_per_cluster);
+        n = MIN(job->cluster_size,
+                job->common.len - start * job->cluster_size);

         if (!bounce_buffer) {
             bounce_buffer = blk_blockalign(blk, job->cluster_size);
         }
         iov.iov_base = bounce_buffer;
-        iov.iov_len = n * BDRV_SECTOR_SIZE;
+        iov.iov_len = n;
         qemu_iovec_init_external(&bounce_qiov, &iov, 1);

         ret = blk_co_preadv(blk, start * job->cluster_size,
                             bounce_qiov.size, &bounce_qiov,
                             is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
         if (ret < 0) {
-            trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, ret);
+            trace_backup_do_cow_read_fail(job, start * job->cluster_size, ret);
             if (error_is_read) {
                 *error_is_read = true;
             }
@@ -157,7 +155,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                                  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
         }
         if (ret < 0) {
-            trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, ret);
+            trace_backup_do_cow_write_fail(job, start * job->cluster_size, ret);
             if (error_is_read) {
                 *error_is_read = false;
             }
@@ -169,8 +167,8 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
         /* Publish progress, guest I/O counts as progress too.  Note that the
          * offset field is an opaque progress value, it is not a disk offset.
          */
-        job->sectors_read += n;
-        job->common.offset += n * BDRV_SECTOR_SIZE;
+        job->bytes_read += n;
+        job->common.offset += n;
     }

 out:
@@ -363,9 +361,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
      */
     if (job->common.speed) {
         uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
-                                                      job->sectors_read *
-                                                      BDRV_SECTOR_SIZE);
-        job->sectors_read = 0;
+                                                      job->bytes_read);
+        job->bytes_read = 0;
         block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
     } else {
         block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 16/21] backup: Switch block_backup.h to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (14 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 15/21] backup: Switch BackupBlockJob to byte-based Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 14:11   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 17/21] backup: Switch backup_do_cow() " Eric Blake
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz, Wen Congyang, Xie Changlong

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting
the public interface to backup jobs (no semantic change), including
a change to CowRequest to track by bytes instead of cluster indices.

Note that this does not change the difference between the public
interface (starting point, and size of the subsequent range) and
the internal interface (starting and end points).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>

---
v3-v4: no change
v2: change a couple more parameter names
---
 include/block/block_backup.h | 11 +++++------
 block/backup.c               | 33 ++++++++++++++++-----------------
 block/replication.c          | 12 ++++++++----
 3 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/include/block/block_backup.h b/include/block/block_backup.h
index 8a75947..994a3bd 100644
--- a/include/block/block_backup.h
+++ b/include/block/block_backup.h
@@ -21,17 +21,16 @@
 #include "block/block_int.h"

 typedef struct CowRequest {
-    int64_t start;
-    int64_t end;
+    int64_t start_byte;
+    int64_t end_byte;
     QLIST_ENTRY(CowRequest) list;
     CoQueue wait_queue; /* coroutines blocked on this request */
 } CowRequest;

-void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
-                                          int nb_sectors);
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
+                                          uint64_t bytes);
 void backup_cow_request_begin(CowRequest *req, BlockJob *job,
-                              int64_t sector_num,
-                              int nb_sectors);
+                              int64_t offset, uint64_t bytes);
 void backup_cow_request_end(CowRequest *req);

 void backup_do_checkpoint(BlockJob *job, Error **errp);
diff --git a/block/backup.c b/block/backup.c
index 4e64710..cfbd921 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -55,7 +55,7 @@ static inline int64_t cluster_size_sectors(BackupBlockJob *job)

 /* See if in-flight requests overlap and wait for them to complete */
 static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
-                                                       int64_t start,
+                                                       int64_t offset,
                                                        int64_t end)
 {
     CowRequest *req;
@@ -64,7 +64,7 @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
     do {
         retry = false;
         QLIST_FOREACH(req, &job->inflight_reqs, list) {
-            if (end > req->start && start < req->end) {
+            if (end > req->start_byte && offset < req->end_byte) {
                 qemu_co_queue_wait(&req->wait_queue, NULL);
                 retry = true;
                 break;
@@ -75,10 +75,10 @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,

 /* Keep track of an in-flight request */
 static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
-                                     int64_t start, int64_t end)
+                              int64_t offset, int64_t end)
 {
-    req->start = start;
-    req->end = end;
+    req->start_byte = offset;
+    req->end_byte = end;
     qemu_co_queue_init(&req->wait_queue);
     QLIST_INSERT_HEAD(&job->inflight_reqs, req, list);
 }
@@ -114,8 +114,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                               sector_num * BDRV_SECTOR_SIZE,
                               nb_sectors * BDRV_SECTOR_SIZE);

-    wait_for_overlapping_requests(job, start, end);
-    cow_request_begin(&cow_request, job, start, end);
+    wait_for_overlapping_requests(job, start * job->cluster_size,
+                                  end * job->cluster_size);
+    cow_request_begin(&cow_request, job, start * job->cluster_size,
+                      end * job->cluster_size);

     for (; start < end; start++) {
         if (test_bit(start, job->done_bitmap)) {
@@ -277,32 +279,29 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
     bitmap_zero(backup_job->done_bitmap, len);
 }

-void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
-                                          int nb_sectors)
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
+                                          uint64_t bytes)
 {
     BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
-    int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
     int64_t start, end;

     assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);

-    start = sector_num / sectors_per_cluster;
-    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+    start = QEMU_ALIGN_DOWN(offset, backup_job->cluster_size);
+    end = QEMU_ALIGN_UP(offset + bytes, backup_job->cluster_size);
     wait_for_overlapping_requests(backup_job, start, end);
 }

 void backup_cow_request_begin(CowRequest *req, BlockJob *job,
-                              int64_t sector_num,
-                              int nb_sectors)
+                              int64_t offset, uint64_t bytes)
 {
     BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
-    int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
     int64_t start, end;

     assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);

-    start = sector_num / sectors_per_cluster;
-    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+    start = QEMU_ALIGN_DOWN(offset, backup_job->cluster_size);
+    end = QEMU_ALIGN_UP(offset + bytes, backup_job->cluster_size);
     cow_request_begin(req, backup_job, start, end);
 }

diff --git a/block/replication.c b/block/replication.c
index 3885f04..8f3aba7 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -234,10 +234,14 @@ static coroutine_fn int replication_co_readv(BlockDriverState *bs,
     }

     if (job) {
-        backup_wait_for_overlapping_requests(child->bs->job, sector_num,
-                                             remaining_sectors);
-        backup_cow_request_begin(&req, child->bs->job, sector_num,
-                                 remaining_sectors);
+        uint64_t remaining_bytes = remaining_sectors * BDRV_SECTOR_SIZE;
+
+        backup_wait_for_overlapping_requests(child->bs->job,
+                                             sector_num * BDRV_SECTOR_SIZE,
+                                             remaining_bytes);
+        backup_cow_request_begin(&req, child->bs->job,
+                                 sector_num * BDRV_SECTOR_SIZE,
+                                 remaining_bytes);
         ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors,
                             qiov);
         backup_cow_request_end(&req);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 17/21] backup: Switch backup_do_cow() to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (15 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 16/21] backup: Switch block_backup.h " Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 14:36   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 18/21] backup: Switch backup_run() " Eric Blake
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>

---
v2-v4: no change
---
 block/backup.c | 62 ++++++++++++++++++++++++----------------------------------
 1 file changed, 26 insertions(+), 36 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index cfbd921..c029d44 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -91,7 +91,7 @@ static void cow_request_end(CowRequest *req)
 }

 static int coroutine_fn backup_do_cow(BackupBlockJob *job,
-                                      int64_t sector_num, int nb_sectors,
+                                      int64_t offset, uint64_t bytes,
                                       bool *error_is_read,
                                       bool is_write_notifier)
 {
@@ -101,34 +101,28 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     QEMUIOVector bounce_qiov;
     void *bounce_buffer = NULL;
     int ret = 0;
-    int64_t sectors_per_cluster = cluster_size_sectors(job);
-    int64_t start, end; /* clusters */
+    int64_t start, end; /* bytes */
     int n; /* bytes */

     qemu_co_rwlock_rdlock(&job->flush_rwlock);

-    start = sector_num / sectors_per_cluster;
-    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+    start = QEMU_ALIGN_DOWN(offset, job->cluster_size);
+    end = QEMU_ALIGN_UP(bytes + offset, job->cluster_size);

-    trace_backup_do_cow_enter(job, start * job->cluster_size,
-                              sector_num * BDRV_SECTOR_SIZE,
-                              nb_sectors * BDRV_SECTOR_SIZE);
+    trace_backup_do_cow_enter(job, start, offset, bytes);

-    wait_for_overlapping_requests(job, start * job->cluster_size,
-                                  end * job->cluster_size);
-    cow_request_begin(&cow_request, job, start * job->cluster_size,
-                      end * job->cluster_size);
+    wait_for_overlapping_requests(job, start, end);
+    cow_request_begin(&cow_request, job, start, end);

-    for (; start < end; start++) {
-        if (test_bit(start, job->done_bitmap)) {
-            trace_backup_do_cow_skip(job, start * job->cluster_size);
+    for (; start < end; start += job->cluster_size) {
+        if (test_bit(start / job->cluster_size, job->done_bitmap)) {
+            trace_backup_do_cow_skip(job, start);
             continue; /* already copied */
         }

-        trace_backup_do_cow_process(job, start * job->cluster_size);
+        trace_backup_do_cow_process(job, start);

-        n = MIN(job->cluster_size,
-                job->common.len - start * job->cluster_size);
+        n = MIN(job->cluster_size, job->common.len - start);

         if (!bounce_buffer) {
             bounce_buffer = blk_blockalign(blk, job->cluster_size);
@@ -137,11 +131,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
         iov.iov_len = n;
         qemu_iovec_init_external(&bounce_qiov, &iov, 1);

-        ret = blk_co_preadv(blk, start * job->cluster_size,
-                            bounce_qiov.size, &bounce_qiov,
+        ret = blk_co_preadv(blk, start, bounce_qiov.size, &bounce_qiov,
                             is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
         if (ret < 0) {
-            trace_backup_do_cow_read_fail(job, start * job->cluster_size, ret);
+            trace_backup_do_cow_read_fail(job, start, ret);
             if (error_is_read) {
                 *error_is_read = true;
             }
@@ -149,22 +142,22 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
         }

         if (buffer_is_zero(iov.iov_base, iov.iov_len)) {
-            ret = blk_co_pwrite_zeroes(job->target, start * job->cluster_size,
+            ret = blk_co_pwrite_zeroes(job->target, start,
                                        bounce_qiov.size, BDRV_REQ_MAY_UNMAP);
         } else {
-            ret = blk_co_pwritev(job->target, start * job->cluster_size,
+            ret = blk_co_pwritev(job->target, start,
                                  bounce_qiov.size, &bounce_qiov,
                                  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
         }
         if (ret < 0) {
-            trace_backup_do_cow_write_fail(job, start * job->cluster_size, ret);
+            trace_backup_do_cow_write_fail(job, start, ret);
             if (error_is_read) {
                 *error_is_read = false;
             }
             goto out;
         }

-        set_bit(start, job->done_bitmap);
+        set_bit(start / job->cluster_size, job->done_bitmap);

         /* Publish progress, guest I/O counts as progress too.  Note that the
          * offset field is an opaque progress value, it is not a disk offset.
@@ -180,8 +173,7 @@ out:

     cow_request_end(&cow_request);

-    trace_backup_do_cow_return(job, sector_num * BDRV_SECTOR_SIZE,
-                               nb_sectors * BDRV_SECTOR_SIZE, ret);
+    trace_backup_do_cow_return(job, offset, bytes, ret);

     qemu_co_rwlock_unlock(&job->flush_rwlock);

@@ -194,14 +186,12 @@ static int coroutine_fn backup_before_write_notify(
 {
     BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
     BdrvTrackedRequest *req = opaque;
-    int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
-    int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;

     assert(req->bs == blk_bs(job->common.blk));
-    assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-    assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+    assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
+    assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));

-    return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
+    return backup_do_cow(job, req->offset, req->bytes, NULL, true);
 }

 static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
@@ -406,8 +396,8 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
                 if (yield_and_check(job)) {
                     goto out;
                 }
-                ret = backup_do_cow(job, cluster * sectors_per_cluster,
-                                    sectors_per_cluster, &error_is_read,
+                ret = backup_do_cow(job, cluster * job->cluster_size,
+                                    job->cluster_size, &error_is_read,
                                     false);
                 if ((ret < 0) &&
                     backup_error_action(job, error_is_read, -ret) ==
@@ -509,8 +499,8 @@ static void coroutine_fn backup_run(void *opaque)
             if (alloced < 0) {
                 ret = alloced;
             } else {
-                ret = backup_do_cow(job, start * sectors_per_cluster,
-                                    sectors_per_cluster, &error_is_read,
+                ret = backup_do_cow(job, start * job->cluster_size,
+                                    job->cluster_size, &error_is_read,
                                     false);
             }
             if (ret < 0) {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 18/21] backup: Switch backup_run() to byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (16 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 17/21] backup: Switch backup_do_cow() " Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 14:43   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based Eric Blake
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of backups to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are cluster-aligned).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
v2-v4: no change
---
 block/backup.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index c029d44..04def91 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -370,11 +370,10 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
     int ret = 0;
     int clusters_per_iter;
     uint32_t granularity;
-    int64_t sector;
+    int64_t offset;
     int64_t cluster;
     int64_t end;
     int64_t last_cluster = -1;
-    int64_t sectors_per_cluster = cluster_size_sectors(job);
     BdrvDirtyBitmapIter *dbi;

     granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
@@ -382,8 +381,8 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
     dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);

     /* Find the next dirty sector(s) */
-    while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
-        cluster = sector / sectors_per_cluster;
+    while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
+        cluster = offset / job->cluster_size;

         /* Fake progress updates for any clusters we skipped */
         if (cluster != last_cluster + 1) {
@@ -410,7 +409,8 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
         /* If the bitmap granularity is smaller than the backup granularity,
          * we need to advance the iterator pointer to the next cluster. */
         if (granularity < job->cluster_size) {
-            bdrv_set_dirty_iter(dbi, cluster * sectors_per_cluster);
+            bdrv_set_dirty_iter(dbi,
+                                cluster * job->cluster_size / BDRV_SECTOR_SIZE);
         }

         last_cluster = cluster - 1;
@@ -432,17 +432,15 @@ static void coroutine_fn backup_run(void *opaque)
     BackupBlockJob *job = opaque;
     BackupCompleteData *data;
     BlockDriverState *bs = blk_bs(job->common.blk);
-    int64_t start, end;
+    int64_t offset;
     int64_t sectors_per_cluster = cluster_size_sectors(job);
     int ret = 0;

     QLIST_INIT(&job->inflight_reqs);
     qemu_co_rwlock_init(&job->flush_rwlock);

-    start = 0;
-    end = DIV_ROUND_UP(job->common.len, job->cluster_size);
-
-    job->done_bitmap = bitmap_new(end);
+    job->done_bitmap = bitmap_new(DIV_ROUND_UP(job->common.len,
+                                               job->cluster_size));

     job->before_write.notify = backup_before_write_notify;
     bdrv_add_before_write_notifier(bs, &job->before_write);
@@ -457,7 +455,8 @@ static void coroutine_fn backup_run(void *opaque)
         ret = backup_run_incremental(job);
     } else {
         /* Both FULL and TOP SYNC_MODE's require copying.. */
-        for (; start < end; start++) {
+        for (offset = 0; offset < job->common.len;
+             offset += job->cluster_size) {
             bool error_is_read;
             int alloced = 0;

@@ -480,8 +479,8 @@ static void coroutine_fn backup_run(void *opaque)
                      * needed but at some point that is always the case. */
                     alloced =
                         bdrv_is_allocated(bs,
-                                start * sectors_per_cluster + i,
-                                sectors_per_cluster - i, &n);
+                                          (offset >> BDRV_SECTOR_BITS) + i,
+                                          sectors_per_cluster - i, &n);
                     i += n;

                     if (alloced || n == 0) {
@@ -499,9 +498,8 @@ static void coroutine_fn backup_run(void *opaque)
             if (alloced < 0) {
                 ret = alloced;
             } else {
-                ret = backup_do_cow(job, start * job->cluster_size,
-                                    job->cluster_size, &error_is_read,
-                                    false);
+                ret = backup_do_cow(job, offset, job->cluster_size,
+                                    &error_is_read, false);
             }
             if (ret < 0) {
                 /* Depending on error action, fail now or retry cluster */
@@ -510,7 +508,7 @@ static void coroutine_fn backup_run(void *opaque)
                 if (action == BLOCK_ERROR_ACTION_REPORT) {
                     break;
                 } else {
-                    start--;
+                    offset -= job->cluster_size;
                     continue;
                 }
             }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (17 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 18/21] backup: Switch backup_run() " Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 16:02   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors Eric Blake
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 21/21] block: Make bdrv_is_allocated_above() byte-based Eric Blake
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Juan Quintela, Dr. David Alan Gilbert

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned
on input and that *pnum is sector-aligned on return to the caller,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, this code adds usages like
DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned
values, where the call might reasonbly give non-aligned results
in the future; on the other hand, no rounding is needed for callers
that should just continue to work with byte alignment.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_is_allocated().  But
some code, particularly bdrv_commit(), gets a lot simpler because it
no longer has to mess with sectors; also, it is now possible to pass
NULL if the caller does not care how much of the image is allocated
beyond the initial offset.

For ease of review, bdrv_is_allocated_above() will be tackled
separately.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>

---
v4: rebase to vvfat TAB cleanup, R-b kept
v3: no change
v2: rebase to earlier changes, tweak commit message
---
 include/block/block.h |  4 +--
 block/backup.c        | 17 ++++---------
 block/commit.c        | 21 +++++++---------
 block/io.c            | 49 +++++++++++++++++++++++++-----------
 block/stream.c        |  5 ++--
 block/vvfat.c         | 34 ++++++++++++++-----------
 migration/block.c     |  9 ++++---
 qemu-img.c            |  5 +++-
 qemu-io-cmds.c        | 70 +++++++++++++++++++++++----------------------------
 9 files changed, 114 insertions(+), 100 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index a9dc753..d3e01fb 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -427,8 +427,8 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
                                     int64_t sector_num,
                                     int nb_sectors, int *pnum,
                                     BlockDriverState **file);
-int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
-                      int *pnum);
+int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
+                      int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
                             int64_t sector_num, int nb_sectors, int *pnum);

diff --git a/block/backup.c b/block/backup.c
index 04def91..b2048bf 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -47,12 +47,6 @@ typedef struct BackupBlockJob {
     QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;

-/* Size of a cluster in sectors, instead of bytes. */
-static inline int64_t cluster_size_sectors(BackupBlockJob *job)
-{
-  return job->cluster_size / BDRV_SECTOR_SIZE;
-}
-
 /* See if in-flight requests overlap and wait for them to complete */
 static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
                                                        int64_t offset,
@@ -433,7 +427,6 @@ static void coroutine_fn backup_run(void *opaque)
     BackupCompleteData *data;
     BlockDriverState *bs = blk_bs(job->common.blk);
     int64_t offset;
-    int64_t sectors_per_cluster = cluster_size_sectors(job);
     int ret = 0;

     QLIST_INIT(&job->inflight_reqs);
@@ -465,12 +458,13 @@ static void coroutine_fn backup_run(void *opaque)
             }

             if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
-                int i, n;
+                int i;
+                int64_t n;

                 /* Check to see if these blocks are already in the
                  * backing file. */

-                for (i = 0; i < sectors_per_cluster;) {
+                for (i = 0; i < job->cluster_size;) {
                     /* bdrv_is_allocated() only returns true/false based
                      * on the first set of sectors it comes across that
                      * are are all in the same state.
@@ -478,9 +472,8 @@ static void coroutine_fn backup_run(void *opaque)
                      * backup cluster length.  We end up copying more than
                      * needed but at some point that is always the case. */
                     alloced =
-                        bdrv_is_allocated(bs,
-                                          (offset >> BDRV_SECTOR_BITS) + i,
-                                          sectors_per_cluster - i, &n);
+                        bdrv_is_allocated(bs, offset + i,
+                                          job->cluster_size - i, &n);
                     i += n;

                     if (alloced || n == 0) {
diff --git a/block/commit.c b/block/commit.c
index c3a7bca..241aa95 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -443,7 +443,7 @@ fail:
 }


-#define COMMIT_BUF_SECTORS 2048
+#define COMMIT_BUF_SIZE (2048 * BDRV_SECTOR_SIZE)

 /* commit COW file into the raw image */
 int bdrv_commit(BlockDriverState *bs)
@@ -452,8 +452,9 @@ int bdrv_commit(BlockDriverState *bs)
     BlockDriverState *backing_file_bs = NULL;
     BlockDriverState *commit_top_bs = NULL;
     BlockDriver *drv = bs->drv;
-    int64_t sector, total_sectors, length, backing_length;
-    int n, ro, open_flags;
+    int64_t offset, length, backing_length;
+    int ro, open_flags;
+    int64_t n;
     int ret = 0;
     uint8_t *buf = NULL;
     Error *local_err = NULL;
@@ -531,30 +532,26 @@ int bdrv_commit(BlockDriverState *bs)
         }
     }

-    total_sectors = length >> BDRV_SECTOR_BITS;
-
     /* blk_try_blockalign() for src will choose an alignment that works for
      * backing as well, so no need to compare the alignment manually. */
-    buf = blk_try_blockalign(src, COMMIT_BUF_SECTORS * BDRV_SECTOR_SIZE);
+    buf = blk_try_blockalign(src, COMMIT_BUF_SIZE);
     if (buf == NULL) {
         ret = -ENOMEM;
         goto ro_cleanup;
     }

-    for (sector = 0; sector < total_sectors; sector += n) {
-        ret = bdrv_is_allocated(bs, sector, COMMIT_BUF_SECTORS, &n);
+    for (offset = 0; offset < length; offset += n) {
+        ret = bdrv_is_allocated(bs, offset, COMMIT_BUF_SIZE, &n);
         if (ret < 0) {
             goto ro_cleanup;
         }
         if (ret) {
-            ret = blk_pread(src, sector * BDRV_SECTOR_SIZE, buf,
-                            n * BDRV_SECTOR_SIZE);
+            ret = blk_pread(src, offset, buf, n);
             if (ret < 0) {
                 goto ro_cleanup;
             }

-            ret = blk_pwrite(backing, sector * BDRV_SECTOR_SIZE, buf,
-                             n * BDRV_SECTOR_SIZE, 0);
+            ret = blk_pwrite(backing, offset, buf, n, 0);
             if (ret < 0) {
                 goto ro_cleanup;
             }
diff --git a/block/io.c b/block/io.c
index f662c97..cb40069 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1036,14 +1036,15 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
         int64_t start_sector = offset >> BDRV_SECTOR_BITS;
         int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
         unsigned int nb_sectors = end_sector - start_sector;
-        int pnum;
+        int64_t pnum;

-        ret = bdrv_is_allocated(bs, start_sector, nb_sectors, &pnum);
+        ret = bdrv_is_allocated(bs, start_sector << BDRV_SECTOR_BITS,
+                                nb_sectors << BDRV_SECTOR_BITS, &pnum);
         if (ret < 0) {
             goto out;
         }

-        if (!ret || pnum != nb_sectors) {
+        if (!ret || pnum != nb_sectors << BDRV_SECTOR_BITS) {
             ret = bdrv_co_do_copy_on_readv(child, offset, bytes, qiov);
             goto out;
         }
@@ -1904,15 +1905,26 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
                                        sector_num, nb_sectors, pnum, file);
 }

-int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
-                                   int nb_sectors, int *pnum)
+int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
+                                   int64_t bytes, int64_t *pnum)
 {
     BlockDriverState *file;
-    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum,
-                                        &file);
+    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
+    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
+    int64_t ret;
+    int psectors;
+
+    assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
+           bytes < INT_MAX * BDRV_SECTOR_SIZE);
+    ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
+                                &file);
     if (ret < 0) {
         return ret;
     }
+    if (pnum) {
+        *pnum = psectors * BDRV_SECTOR_SIZE;
+    }
     return !!(ret & BDRV_BLOCK_ALLOCATED);
 }

@@ -1921,7 +1933,8 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
  *
  * Return true if the given sector is allocated in any image between
  * BASE and TOP (inclusive).  BASE can be NULL to check if the given
- * sector is allocated in any image of the chain.  Return false otherwise.
+ * sector is allocated in any image of the chain.  Return false otherwise,
+ * or negative errno on failure.
  *
  * 'pnum' is set to the number of sectors (including and immediately following
  *  the specified sector) that are known to be in the same
@@ -1938,13 +1951,19 @@ int bdrv_is_allocated_above(BlockDriverState *top,

     intermediate = top;
     while (intermediate && intermediate != base) {
-        int pnum_inter;
-        ret = bdrv_is_allocated(intermediate, sector_num, nb_sectors,
+        int64_t pnum_inter;
+        int psectors_inter;
+
+        ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
+                                nb_sectors * BDRV_SECTOR_SIZE,
                                 &pnum_inter);
         if (ret < 0) {
             return ret;
-        } else if (ret) {
-            *pnum = pnum_inter;
+        }
+        assert(pnum_inter < INT_MAX * BDRV_SECTOR_SIZE);
+        psectors_inter = pnum_inter >> BDRV_SECTOR_BITS;
+        if (ret) {
+            *pnum = psectors_inter;
             return 1;
         }

@@ -1954,10 +1973,10 @@ int bdrv_is_allocated_above(BlockDriverState *top,
          *
          * [sector_num+x, nr_sectors] allocated.
          */
-        if (n > pnum_inter &&
+        if (n > psectors_inter &&
             (intermediate == top ||
-             sector_num + pnum_inter < intermediate->total_sectors)) {
-            n = pnum_inter;
+             sector_num + psectors_inter < intermediate->total_sectors)) {
+            n = psectors_inter;
         }

         intermediate = backing_bs(intermediate);
diff --git a/block/stream.c b/block/stream.c
index e3dd2ac..e5f2a08 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -137,6 +137,7 @@ static void coroutine_fn stream_run(void *opaque)

     for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
         bool copy;
+        int64_t count = 0;

         /* Note that even when no rate limit is applied we need to yield
          * with no pending I/O here so that bdrv_drain_all() returns.
@@ -148,8 +149,8 @@ static void coroutine_fn stream_run(void *opaque)

         copy = false;

-        ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
-                                STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
+        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count);
+        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
         if (ret == 1) {
             /* Allocated in the top, no need to copy.  */
         } else if (ret >= 0) {
diff --git a/block/vvfat.c b/block/vvfat.c
index f55104c..4fd28e1 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1482,24 +1482,27 @@ static int vvfat_read(BlockDriverState *bs, int64_t sector_num,
         if (sector_num >= bs->total_sectors)
            return -1;
         if (s->qcow) {
-            int n;
+            int64_t n;
             int ret;
-            ret = bdrv_is_allocated(s->qcow->bs, sector_num,
-                                    nb_sectors - i, &n);
+            ret = bdrv_is_allocated(s->qcow->bs, sector_num * BDRV_SECTOR_SIZE,
+                                    (nb_sectors - i) * BDRV_SECTOR_SIZE, &n);
             if (ret < 0) {
                 return ret;
             }
             if (ret) {
-                DLOG(fprintf(stderr, "sectors %d+%d allocated\n",
-                             (int)sector_num, n));
-                if (bdrv_read(s->qcow, sector_num, buf + i * 0x200, n)) {
+                DLOG(fprintf(stderr, "sectors %" PRId64 "+%" PRId64
+                             " allocated\n", sector_num,
+                             n >> BDRV_SECTOR_BITS));
+                if (bdrv_read(s->qcow, sector_num, buf + i * 0x200,
+                              n >> BDRV_SECTOR_BITS)) {
                     return -1;
                 }
-                i += n - 1;
-                sector_num += n - 1;
+                i += (n >> BDRV_SECTOR_BITS) - 1;
+                sector_num += (n >> BDRV_SECTOR_BITS) - 1;
                 continue;
             }
-DLOG(fprintf(stderr, "sector %d not allocated\n", (int)sector_num));
+            DLOG(fprintf(stderr, "sector %" PRId64 " not allocated\n",
+                         sector_num));
         }
         if (sector_num < s->offset_to_root_dir) {
             if (sector_num < s->offset_to_fat) {
@@ -1779,7 +1782,7 @@ static inline bool cluster_was_modified(BDRVVVFATState *s,
                                         uint32_t cluster_num)
 {
     int was_modified = 0;
-    int i, dummy;
+    int i;

     if (s->qcow == NULL) {
         return 0;
@@ -1787,8 +1790,9 @@ static inline bool cluster_was_modified(BDRVVVFATState *s,

     for (i = 0; !was_modified && i < s->sectors_per_cluster; i++) {
         was_modified = bdrv_is_allocated(s->qcow->bs,
-                                         cluster2sector(s, cluster_num) + i,
-                                         1, &dummy);
+                                         (cluster2sector(s, cluster_num) +
+                                          i) * BDRV_SECTOR_SIZE,
+                                         BDRV_SECTOR_SIZE, NULL);
     }

     /*
@@ -1935,7 +1939,7 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s,
             }

             if (copy_it) {
-                int i, dummy;
+                int i;
                 /*
                  * This is horribly inefficient, but that is okay, since
                  * it is rarely executed, if at all.
@@ -1946,7 +1950,9 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s,
                 for (i = 0; i < s->sectors_per_cluster; i++) {
                     int res;

-                    res = bdrv_is_allocated(s->qcow->bs, offset + i, 1, &dummy);
+                    res = bdrv_is_allocated(s->qcow->bs,
+                                            (offset + i) * BDRV_SECTOR_SIZE,
+                                            BDRV_SECTOR_SIZE, NULL);
                     if (res < 0) {
                         return -1;
                     }
diff --git a/migration/block.c b/migration/block.c
index 7674ae1..4a48e5c 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -34,7 +34,7 @@
 #define BLK_MIG_FLAG_PROGRESS           0x04
 #define BLK_MIG_FLAG_ZERO_BLOCK         0x08

-#define MAX_IS_ALLOCATED_SEARCH 65536
+#define MAX_IS_ALLOCATED_SEARCH (65536 * BDRV_SECTOR_SIZE)

 #define MAX_INFLIGHT_IO 512

@@ -267,6 +267,7 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
     BlockBackend *bb = bmds->blk;
     BlkMigBlock *blk;
     int nr_sectors;
+    int64_t count;

     if (bmds->shared_base) {
         qemu_mutex_lock_iothread();
@@ -274,9 +275,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
         /* Skip unallocated sectors; intentionally treats failure as
          * an allocated sector */
         while (cur_sector < total_sectors &&
-               !bdrv_is_allocated(blk_bs(bb), cur_sector,
-                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
-            cur_sector += nr_sectors;
+               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
+                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
+            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
         }
         aio_context_release(blk_get_aio_context(bb));
         qemu_mutex_unlock_iothread();
diff --git a/qemu-img.c b/qemu-img.c
index d5a1560..5271b41 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3229,6 +3229,7 @@ static int img_rebase(int argc, char **argv)
         int64_t new_backing_num_sectors = 0;
         uint64_t sector;
         int n;
+        int64_t count;
         float local_progress = 0;

         buf_old = blk_blockalign(blk, IO_BUF_SIZE);
@@ -3276,12 +3277,14 @@ static int img_rebase(int argc, char **argv)
             }

             /* If the cluster is allocated, we don't need to take action */
-            ret = bdrv_is_allocated(bs, sector, n, &n);
+            ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS,
+                                    n << BDRV_SECTOR_BITS, &count);
             if (ret < 0) {
                 error_report("error while reading image metadata: %s",
                              strerror(-ret));
                 goto out;
             }
+            n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
             if (ret) {
                 continue;
             }
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index b0ea327..e529b8f 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1760,12 +1760,12 @@ out:
 static int alloc_f(BlockBackend *blk, int argc, char **argv)
 {
     BlockDriverState *bs = blk_bs(blk);
-    int64_t offset, sector_num, nb_sectors, remaining, count;
+    int64_t offset, start, remaining, count;
     char s1[64];
-    int num, ret;
-    int64_t sum_alloc;
+    int ret;
+    int64_t num, sum_alloc;

-    offset = cvtnum(argv[1]);
+    start = offset = cvtnum(argv[1]);
     if (offset < 0) {
         print_cvtnum_err(offset, argv[1]);
         return 0;
@@ -1793,32 +1793,30 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
                count);
         return 0;
     }
-    nb_sectors = count >> BDRV_SECTOR_BITS;

-    remaining = nb_sectors;
+    remaining = count;
     sum_alloc = 0;
-    sector_num = offset >> 9;
     while (remaining) {
-        ret = bdrv_is_allocated(bs, sector_num, remaining, &num);
+        ret = bdrv_is_allocated(bs, offset, remaining, &num);
         if (ret < 0) {
             printf("is_allocated failed: %s\n", strerror(-ret));
             return 0;
         }
-        sector_num += num;
+        offset += num;
         remaining -= num;
         if (ret) {
             sum_alloc += num;
         }
         if (num == 0) {
-            nb_sectors -= remaining;
+            count -= remaining;
             remaining = 0;
         }
     }

-    cvtstr(offset, s1, sizeof(s1));
+    cvtstr(start, s1, sizeof(s1));

     printf("%"PRId64"/%"PRId64" bytes allocated at offset %s\n",
-           sum_alloc << BDRV_SECTOR_BITS, nb_sectors << BDRV_SECTOR_BITS, s1);
+           sum_alloc, count, s1);
     return 0;
 }

@@ -1833,14 +1831,15 @@ static const cmdinfo_t alloc_cmd = {
 };


-static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
-                            int64_t nb_sectors, int64_t *pnum)
+static int map_is_allocated(BlockDriverState *bs, int64_t offset,
+                            int64_t bytes, int64_t *pnum)
 {
-    int num, num_checked;
+    int64_t num;
+    int num_checked;
     int ret, firstret;

-    num_checked = MIN(nb_sectors, INT_MAX);
-    ret = bdrv_is_allocated(bs, sector_num, num_checked, &num);
+    num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
+    ret = bdrv_is_allocated(bs, offset, num_checked, &num);
     if (ret < 0) {
         return ret;
     }
@@ -1848,12 +1847,12 @@ static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
     firstret = ret;
     *pnum = num;

-    while (nb_sectors > 0 && ret == firstret) {
-        sector_num += num;
-        nb_sectors -= num;
+    while (bytes > 0 && ret == firstret) {
+        offset += num;
+        bytes -= num;

-        num_checked = MIN(nb_sectors, INT_MAX);
-        ret = bdrv_is_allocated(bs, sector_num, num_checked, &num);
+        num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
+        ret = bdrv_is_allocated(bs, offset, num_checked, &num);
         if (ret == firstret && num) {
             *pnum += num;
         } else {
@@ -1866,25 +1865,21 @@ static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,

 static int map_f(BlockBackend *blk, int argc, char **argv)
 {
-    int64_t offset;
-    int64_t nb_sectors, total_sectors;
+    int64_t offset, bytes;
     char s1[64], s2[64];
     int64_t num;
     int ret;
     const char *retstr;

     offset = 0;
-    total_sectors = blk_nb_sectors(blk);
-    if (total_sectors < 0) {
-        error_report("Failed to query image length: %s",
-                     strerror(-total_sectors));
+    bytes = blk_getlength(blk);
+    if (bytes < 0) {
+        error_report("Failed to query image length: %s", strerror(-bytes));
         return 0;
     }

-    nb_sectors = total_sectors;
-
-    do {
-        ret = map_is_allocated(blk_bs(blk), offset, nb_sectors, &num);
+    while (bytes) {
+        ret = map_is_allocated(blk_bs(blk), offset, bytes, &num);
         if (ret < 0) {
             error_report("Failed to get allocation status: %s", strerror(-ret));
             return 0;
@@ -1894,15 +1889,14 @@ static int map_f(BlockBackend *blk, int argc, char **argv)
         }

         retstr = ret ? "    allocated" : "not allocated";
-        cvtstr(num << BDRV_SECTOR_BITS, s1, sizeof(s1));
-        cvtstr(offset << BDRV_SECTOR_BITS, s2, sizeof(s2));
+        cvtstr(num, s1, sizeof(s1));
+        cvtstr(offset, s2, sizeof(s2));
         printf("%s (0x%" PRIx64 ") bytes %s at offset %s (0x%" PRIx64 ")\n",
-               s1, num << BDRV_SECTOR_BITS, retstr,
-               s2, offset << BDRV_SECTOR_BITS);
+               s1, num, retstr, s2, offset);

         offset += num;
-        nb_sectors -= num;
-    } while (offset < total_sectors);
+        bytes -= num;
+    };

     return 0;
 }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (18 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06  0:23   ` John Snow
  2017-07-06 16:48   ` Kevin Wolf
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 21/21] block: Make bdrv_is_allocated_above() byte-based Eric Blake
  20 siblings, 2 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, jcody, qemu-block, Stefan Hajnoczi, Fam Zheng, Max Reitz

bdrv_is_allocated_above() was relying on intermediate->total_sectors,
which is a field that can have stale contents depending on the value
of intermediate->has_variable_length.  An audit shows that we are safe
(we were first calling through bdrv_co_get_block_status() which in
turn calls bdrv_nb_sectors() and therefore just refreshed the current
length), but it's nicer to favor our accessor functions to avoid having
to repeat such an audit, even if it means refresh_total_sectors() is
called more frequently.

Suggested-by: John Snow <jsnow@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Jeff Cody <jcody@redhat.com>

---
v3-v4: no change
v2: new patch
---
 block/io.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index cb40069..fb8d1c7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1952,6 +1952,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
     intermediate = top;
     while (intermediate && intermediate != base) {
         int64_t pnum_inter;
+        int64_t size_inter;
         int psectors_inter;

         ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
@@ -1969,13 +1970,14 @@ int bdrv_is_allocated_above(BlockDriverState *top,

         /*
          * [sector_num, nb_sectors] is unallocated on top but intermediate
-         * might have
-         *
-         * [sector_num+x, nr_sectors] allocated.
+         * might have [sector_num+x, nb_sectors-x] allocated.
          */
+        size_inter = bdrv_nb_sectors(intermediate);
+        if (size_inter < 0) {
+            return size_inter;
+        }
         if (n > psectors_inter &&
-            (intermediate == top ||
-             sector_num + psectors_inter < intermediate->total_sectors)) {
+            (intermediate == top || sector_num + psectors_inter < size_inter)) {
             n = psectors_inter;
         }

-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v4 21/21] block: Make bdrv_is_allocated_above() byte-based
  2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (19 preceding siblings ...)
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors Eric Blake
@ 2017-07-05 21:08 ` Eric Blake
  2017-07-06 17:13   ` Kevin Wolf
  20 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-05 21:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, jsnow, jcody, qemu-block, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Wen Congyang, Xie Changlong

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, for the most part this patch is just the
addition of scaling at the callers followed by inverse scaling at
bdrv_is_allocated().  But some code, particularly stream_run(),
gets a lot simpler because it no longer has to mess with sectors.

For ease of review, bdrv_is_allocated() was tackled separately.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Xie Changlong <xiechanglong@cmss.chinamobile.com> [replication part]
Reviewed-by: Jeff Cody <jcody@redhat.com>

---
v3-v4: no change
v2: tweak function comments, favor bdrv_getlength() over ->total_sectors
---
 include/block/block.h |  2 +-
 block/commit.c        | 20 ++++++++------------
 block/io.c            | 42 ++++++++++++++++++++----------------------
 block/mirror.c        |  5 ++++-
 block/replication.c   | 17 ++++++++++++-----
 block/stream.c        | 21 +++++++++------------
 qemu-img.c            | 10 +++++++---
 7 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index d3e01fb..f0fdbe8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -430,7 +430,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
                       int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
-                            int64_t sector_num, int nb_sectors, int *pnum);
+                            int64_t offset, int64_t bytes, int64_t *pnum);

 bool bdrv_is_read_only(BlockDriverState *bs);
 bool bdrv_is_writable(BlockDriverState *bs);
diff --git a/block/commit.c b/block/commit.c
index 241aa95..774a8a5 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -146,7 +146,7 @@ static void coroutine_fn commit_run(void *opaque)
     int64_t offset;
     uint64_t delay_ns = 0;
     int ret = 0;
-    int n = 0; /* sectors */
+    int64_t n = 0; /* bytes */
     void *buf = NULL;
     int bytes_written = 0;
     int64_t base_len;
@@ -171,7 +171,7 @@ static void coroutine_fn commit_run(void *opaque)

     buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);

-    for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
+    for (offset = 0; offset < s->common.len; offset += n) {
         bool copy;

         /* Note that even when no rate limit is applied we need to yield
@@ -183,15 +183,12 @@ static void coroutine_fn commit_run(void *opaque)
         }
         /* Copy if allocated above the base */
         ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
-                                      offset / BDRV_SECTOR_SIZE,
-                                      COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
-                                      &n);
+                                      offset, COMMIT_BUFFER_SIZE, &n);
         copy = (ret == 1);
-        trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
+        trace_commit_one_iteration(s, offset, n, ret);
         if (copy) {
-            ret = commit_populate(s->top, s->base, offset,
-                                  n * BDRV_SECTOR_SIZE, buf);
-            bytes_written += n * BDRV_SECTOR_SIZE;
+            ret = commit_populate(s->top, s->base, offset, n, buf);
+            bytes_written += n;
         }
         if (ret < 0) {
             BlockErrorAction action =
@@ -204,11 +201,10 @@ static void coroutine_fn commit_run(void *opaque)
             }
         }
         /* Publish progress */
-        s->common.offset += n * BDRV_SECTOR_SIZE;
+        s->common.offset += n;

         if (copy && s->common.speed) {
-            delay_ns = ratelimit_calculate_delay(&s->limit,
-                                                 n * BDRV_SECTOR_SIZE);
+            delay_ns = ratelimit_calculate_delay(&s->limit, n);
         }
     }

diff --git a/block/io.c b/block/io.c
index fb8d1c7..569c503 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1931,54 +1931,52 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
 /*
  * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
  *
- * Return true if the given sector is allocated in any image between
- * BASE and TOP (inclusive).  BASE can be NULL to check if the given
- * sector is allocated in any image of the chain.  Return false otherwise,
+ * Return true if the (prefix of the) given range is allocated in any image
+ * between BASE and TOP (inclusive).  BASE can be NULL to check if the given
+ * offset is allocated in any image of the chain.  Return false otherwise,
  * or negative errno on failure.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- *  the specified sector) that are known to be in the same
- *  allocated/unallocated state.
+ * 'pnum' is set to the number of bytes (including and immediately
+ * following the specified offset) that are known to be in the same
+ * allocated/unallocated state.  Note that a subsequent call starting
+ * at 'offset + *pnum' may return the same allocation status (in other
+ * words, the result is not necessarily the maximum possible range);
+ * but 'pnum' will only be 0 when end of file is reached.
  *
  */
 int bdrv_is_allocated_above(BlockDriverState *top,
                             BlockDriverState *base,
-                            int64_t sector_num,
-                            int nb_sectors, int *pnum)
+                            int64_t offset, int64_t bytes, int64_t *pnum)
 {
     BlockDriverState *intermediate;
-    int ret, n = nb_sectors;
+    int ret;
+    int64_t n = bytes;

     intermediate = top;
     while (intermediate && intermediate != base) {
         int64_t pnum_inter;
         int64_t size_inter;
-        int psectors_inter;

-        ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
-                                nb_sectors * BDRV_SECTOR_SIZE,
-                                &pnum_inter);
+        ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
         if (ret < 0) {
             return ret;
         }
-        assert(pnum_inter < INT_MAX * BDRV_SECTOR_SIZE);
-        psectors_inter = pnum_inter >> BDRV_SECTOR_BITS;
         if (ret) {
-            *pnum = psectors_inter;
+            *pnum = pnum_inter;
             return 1;
         }

         /*
-         * [sector_num, nb_sectors] is unallocated on top but intermediate
-         * might have [sector_num+x, nb_sectors-x] allocated.
+         * [offset, bytes] is unallocated on top but intermediate
+         * might have [offset+x, bytes-x] allocated.
          */
-        size_inter = bdrv_nb_sectors(intermediate);
+        size_inter = bdrv_getlength(intermediate);
         if (size_inter < 0) {
             return size_inter;
         }
-        if (n > psectors_inter &&
-            (intermediate == top || sector_num + psectors_inter < size_inter)) {
-            n = psectors_inter;
+        if (n > pnum_inter &&
+            (intermediate == top || offset + pnum_inter < size_inter)) {
+            n = pnum_inter;
         }

         intermediate = backing_bs(intermediate);
diff --git a/block/mirror.c b/block/mirror.c
index f54a8d7..c717f60 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -621,6 +621,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
     BlockDriverState *bs = s->source;
     BlockDriverState *target_bs = blk_bs(s->target);
     int ret, n;
+    int64_t count;

     end = s->bdev_length / BDRV_SECTOR_SIZE;

@@ -670,11 +671,13 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
             return 0;
         }

-        ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, &n);
+        ret = bdrv_is_allocated_above(bs, base, sector_num * BDRV_SECTOR_SIZE,
+                                      nb_sectors * BDRV_SECTOR_SIZE, &count);
         if (ret < 0) {
             return ret;
         }

+        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
         assert(n > 0);
         if (ret == 1) {
             bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
diff --git a/block/replication.c b/block/replication.c
index 8f3aba7..bf4462c 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -264,7 +264,8 @@ static coroutine_fn int replication_co_writev(BlockDriverState *bs,
     BdrvChild *top = bs->file;
     BdrvChild *base = s->secondary_disk;
     BdrvChild *target;
-    int ret, n;
+    int ret;
+    int64_t n;

     ret = replication_get_io_status(s);
     if (ret < 0) {
@@ -283,14 +284,20 @@ static coroutine_fn int replication_co_writev(BlockDriverState *bs,
      */
     qemu_iovec_init(&hd_qiov, qiov->niov);
     while (remaining_sectors > 0) {
-        ret = bdrv_is_allocated_above(top->bs, base->bs, sector_num,
-                                      remaining_sectors, &n);
+        int64_t count;
+
+        ret = bdrv_is_allocated_above(top->bs, base->bs,
+                                      sector_num * BDRV_SECTOR_SIZE,
+                                      remaining_sectors * BDRV_SECTOR_SIZE,
+                                      &count);
         if (ret < 0) {
             goto out1;
         }

+        assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
+        n = count >> BDRV_SECTOR_BITS;
         qemu_iovec_reset(&hd_qiov);
-        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * BDRV_SECTOR_SIZE);
+        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, count);

         target = ret ? top : base;
         ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
@@ -300,7 +307,7 @@ static coroutine_fn int replication_co_writev(BlockDriverState *bs,

         remaining_sectors -= n;
         sector_num += n;
-        bytes_done += n * BDRV_SECTOR_SIZE;
+        bytes_done += count;
     }

 out1:
diff --git a/block/stream.c b/block/stream.c
index e5f2a08..e6f7234 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -111,7 +111,7 @@ static void coroutine_fn stream_run(void *opaque)
     uint64_t delay_ns = 0;
     int error = 0;
     int ret = 0;
-    int n = 0; /* sectors */
+    int64_t n = 0; /* bytes */
     void *buf;

     if (!bs->backing) {
@@ -135,9 +135,8 @@ static void coroutine_fn stream_run(void *opaque)
         bdrv_enable_copy_on_read(bs);
     }

-    for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
+    for ( ; offset < s->common.len; offset += n) {
         bool copy;
-        int64_t count = 0;

         /* Note that even when no rate limit is applied we need to yield
          * with no pending I/O here so that bdrv_drain_all() returns.
@@ -149,26 +148,25 @@ static void coroutine_fn stream_run(void *opaque)

         copy = false;

-        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count);
-        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
+        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &n);
         if (ret == 1) {
             /* Allocated in the top, no need to copy.  */
         } else if (ret >= 0) {
             /* Copy if allocated in the intermediate images.  Limit to the
              * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
             ret = bdrv_is_allocated_above(backing_bs(bs), base,
-                                          offset / BDRV_SECTOR_SIZE, n, &n);
+                                          offset, n, &n);

             /* Finish early if end of backing file has been reached */
             if (ret == 0 && n == 0) {
-                n = (s->common.len - offset) / BDRV_SECTOR_SIZE;
+                n = s->common.len - offset;
             }

             copy = (ret == 1);
         }
-        trace_stream_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
+        trace_stream_one_iteration(s, offset, n, ret);
         if (copy) {
-            ret = stream_populate(blk, offset, n * BDRV_SECTOR_SIZE, buf);
+            ret = stream_populate(blk, offset, n, buf);
         }
         if (ret < 0) {
             BlockErrorAction action =
@@ -187,10 +185,9 @@ static void coroutine_fn stream_run(void *opaque)
         ret = 0;

         /* Publish progress */
-        s->common.offset += n * BDRV_SECTOR_SIZE;
+        s->common.offset += n;
         if (copy && s->common.speed) {
-            delay_ns = ratelimit_calculate_delay(&s->limit,
-                                                 n * BDRV_SECTOR_SIZE);
+            delay_ns = ratelimit_calculate_delay(&s->limit, n);
         }
     }

diff --git a/qemu-img.c b/qemu-img.c
index 5271b41..960f42a 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1477,12 +1477,16 @@ static int img_compare(int argc, char **argv)
         }

         for (;;) {
+            int64_t count;
+
             nb_sectors = sectors_to_process(total_sectors_over, sector_num);
             if (nb_sectors <= 0) {
                 break;
             }
-            ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL, sector_num,
-                                          nb_sectors, &pnum);
+            ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL,
+                                          sector_num * BDRV_SECTOR_SIZE,
+                                          nb_sectors * BDRV_SECTOR_SIZE,
+                                          &count);
             if (ret < 0) {
                 ret = 3;
                 error_report("Sector allocation test failed for %s",
@@ -1490,7 +1494,7 @@ static int img_compare(int argc, char **argv)
                 goto out;

             }
-            nb_sectors = pnum;
+            nb_sectors = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
             if (ret) {
                 ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
                                           filename_over, buf1, quiet);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete()
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete() Eric Blake
@ 2017-07-06  0:05   ` John Snow
  2017-07-06 10:38   ` Kevin Wolf
  1 sibling, 0 replies; 46+ messages in thread
From: John Snow @ 2017-07-06  0:05 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, jcody, qemu-block, Max Reitz



On 07/05/2017 05:08 PM, Eric Blake wrote:
> stream_complete() skips the work of rewriting the backing file if
> the job was cancelled, if data->reached_end is false, or if there
> was an error detected (non-zero data->ret) during the streaming.
> But note that in stream_run(), data->reached_end is only set if the
> loop ran to completion, and data->ret is only 0 in two cases:
> either the loop ran to completion (possibly by cancellation, but
> stream_complete checks for that), or we took an early goto out
> because there is no bs->backing.  Thus, we can preserve the same
> semantics without the use of reached_end, by merely checking for
> bs->backing (and logically, if there was no backing file, streaming
> is a no-op, so there is no backing file to rewrite).
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v4: new patch
> ---
>  block/stream.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index 746d525..12f1659 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -59,7 +59,6 @@ static int coroutine_fn stream_populate(BlockBackend *blk,
> 
>  typedef struct {
>      int ret;
> -    bool reached_end;
>  } StreamCompleteData;
> 
>  static void stream_complete(BlockJob *job, void *opaque)
> @@ -70,7 +69,7 @@ static void stream_complete(BlockJob *job, void *opaque)
>      BlockDriverState *base = s->base;
>      Error *local_err = NULL;
> 
> -    if (!block_job_is_cancelled(&s->common) && data->reached_end &&
> +    if (!block_job_is_cancelled(&s->common) && bs->backing &&
>          data->ret == 0) {
>          const char *base_id = NULL, *base_fmt = NULL;
>          if (base) {
> @@ -211,7 +210,6 @@ out:
>      /* Modify backing chain and close BDSes in main loop */
>      data = g_malloc(sizeof(*data));
>      data->ret = ret;
> -    data->reached_end = sector_num == end;
>      block_job_defer_to_main_loop(&s->common, stream_complete, data);
>  }
> 

This seems ever so slightly less intuitive to me, but it is functionally
identical.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob to byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob " Eric Blake
@ 2017-07-06  0:14   ` John Snow
  2017-07-06 10:42   ` Kevin Wolf
  1 sibling, 0 replies; 46+ messages in thread
From: John Snow @ 2017-07-06  0:14 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, jcody, qemu-block, Max Reitz



On 07/05/2017 05:08 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting an
> internal structure (no semantic change), and all references to the
> buffer size.
> 
> Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS
> (necessary for interaction with sector-based dirty bitmaps, until
> a later patch converts those to be byte-based) does not suffer from
> truncation problems.
> 
> [checkpatch has a false positive on use of MIN() in this patch]
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors Eric Blake
@ 2017-07-06  0:23   ` John Snow
  2017-07-06 16:48   ` Kevin Wolf
  1 sibling, 0 replies; 46+ messages in thread
From: John Snow @ 2017-07-06  0:23 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, Fam Zheng, qemu-block, jcody, Max Reitz, Stefan Hajnoczi



On 07/05/2017 05:08 PM, Eric Blake wrote:
> bdrv_is_allocated_above() was relying on intermediate->total_sectors,
> which is a field that can have stale contents depending on the value
> of intermediate->has_variable_length.  An audit shows that we are safe
> (we were first calling through bdrv_co_get_block_status() which in
> turn calls bdrv_nb_sectors() and therefore just refreshed the current
> length), but it's nicer to favor our accessor functions to avoid having
> to repeat such an audit, even if it means refresh_total_sectors() is
> called more frequently.
> 
> Suggested-by: John Snow <jsnow@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete()
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete() Eric Blake
  2017-07-06  0:05   ` John Snow
@ 2017-07-06 10:38   ` Kevin Wolf
  1 sibling, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 10:38 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> stream_complete() skips the work of rewriting the backing file if
> the job was cancelled, if data->reached_end is false, or if there
> was an error detected (non-zero data->ret) during the streaming.
> But note that in stream_run(), data->reached_end is only set if the
> loop ran to completion, and data->ret is only 0 in two cases:
> either the loop ran to completion (possibly by cancellation, but
> stream_complete checks for that), or we took an early goto out
> because there is no bs->backing.  Thus, we can preserve the same
> semantics without the use of reached_end, by merely checking for
> bs->backing (and logically, if there was no backing file, streaming
> is a no-op, so there is no backing file to rewrite).
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/21] stream: Switch stream_run() to byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 05/21] stream: Switch stream_run() to byte-based Eric Blake
@ 2017-07-06 10:39   ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 10:39 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Change the internal
> loop iteration of streaming to track by bytes instead of sectors
> (although we are still guaranteed that we iterate by steps that
> are sector-aligned).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob to byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob " Eric Blake
  2017-07-06  0:14   ` John Snow
@ 2017-07-06 10:42   ` Kevin Wolf
  1 sibling, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 10:42 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting an
> internal structure (no semantic change), and all references to the
> buffer size.
> 
> Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS
> (necessary for interaction with sector-based dirty bitmaps, until
> a later patch converts those to be byte-based) does not suffer from
> truncation problems.
> 
> [checkpatch has a false positive on use of MIN() in this patch]
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/21] mirror: Switch mirror_cow_align() to byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 11/21] mirror: Switch mirror_cow_align() to byte-based Eric Blake
@ 2017-07-06 11:16   ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 11:16 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change), and add mirror_clip_bytes() as a
> counterpart to mirror_clip_sectors().  Some of the conversion is
> a bit tricky, requiring temporaries to convert between units; it
> will be cleared up in a following patch.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

> -    if (align_nb_sectors > max_sectors) {
> -        align_nb_sectors = max_sectors;
> +    if (align_bytes > max_bytes) {
> +        align_bytes = max_bytes;
>          if (need_cow) {
> -            align_nb_sectors = QEMU_ALIGN_DOWN(align_nb_sectors,
> -                                               s->target_cluster_size >>
> -                                               BDRV_SECTOR_BITS);
> +            align_bytes = QEMU_ALIGN_DOWN(align_bytes,
> +                                          s->target_cluster_size);

This would fit in a single line, if you have to respin for some reason.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/21] mirror: Switch mirror_do_read() to byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 12/21] mirror: Switch mirror_do_read() " Eric Blake
@ 2017-07-06 13:30   ` Kevin Wolf
  2017-07-06 14:25     ` Eric Blake
  0 siblings, 1 reply; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 13:30 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

> -    /* The sector range must meet granularity because:
> +    assert(bytes <= s->buf_size);
> +    /* The range will be sector-aligned because:
>       * 1) Caller passes in aligned values;
> -     * 2) mirror_cow_align is used only when target cluster is larger. */
> -    assert(!(sector_num % sectors_per_chunk));
> -    nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
> +     * 2) mirror_cow_align is used only when target cluster is larger.
> +     * But it might not be cluster-aligned at end-of-file. */
> +    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
> +    nb_chunks = DIV_ROUND_UP(bytes, s->granularity);

The assertion in the old code was about sector_num (i.e.  that the start
of the range is cluster-aligned), not about nb_sectors, so while you add
a new assertion, it is asserting something different. This explains
why you had to switch to sector aligned even though the semantics
shouldn't be changed by this patch.

Is this intentional or did you confuse sector_num and nb_sectors? I
think we can just have both assertions.

The rest of the patch looks okay.

Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/21] mirror: Switch mirror_iteration() to byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 13/21] mirror: Switch mirror_iteration() " Eric Blake
@ 2017-07-06 13:47   ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 13:47 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Change the internal
> loop iteration of mirroring to track by bytes instead of sectors
> (although we are still guaranteed that we iterate by steps that
> are both sector-aligned and multiples of the granularity).  Drop
> the now-unused mirror_clip_sectors().
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/21] block: Drop unused bdrv_round_sectors_to_clusters()
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 14/21] block: Drop unused bdrv_round_sectors_to_clusters() Eric Blake
@ 2017-07-06 13:49   ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 13:49 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, jsnow, jcody, qemu-block, Stefan Hajnoczi, Fam Zheng,
	Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> Now that the last user [mirror_iteration()] has converted to using
> bytes, we no longer need a function to round sectors to clusters.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 15/21] backup: Switch BackupBlockJob to byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 15/21] backup: Switch BackupBlockJob to byte-based Eric Blake
@ 2017-07-06 13:59   ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 13:59 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting an
> internal structure (no semantic change), and all references to
> tracking progress.  Drop a redundant local variable bytes_per_cluster.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/21] backup: Switch block_backup.h to byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 16/21] backup: Switch block_backup.h " Eric Blake
@ 2017-07-06 14:11   ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 14:11 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz, Wen Congyang,
	Xie Changlong

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting
> the public interface to backup jobs (no semantic change), including
> a change to CowRequest to track by bytes instead of cluster indices.
> 
> Note that this does not change the difference between the public
> interface (starting point, and size of the subsequent range) and
> the internal interface (starting and end points).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

> diff --git a/block/backup.c b/block/backup.c
> index 4e64710..cfbd921 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -55,7 +55,7 @@ static inline int64_t cluster_size_sectors(BackupBlockJob *job)
> 
>  /* See if in-flight requests overlap and wait for them to complete */
>  static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
> -                                                       int64_t start,
> +                                                       int64_t offset,
>                                                         int64_t end)

Not sure how useful this rename is. I think I would have renamed either
both or probably actually none of the parameters. start/end is a nice
pair, offset/end is kind of unexpected. start_offset/end_offset would be
an option if you do want to rename, but I don't see a need for it in
this very short function.

>  {
>      CowRequest *req;
> @@ -64,7 +64,7 @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
>      do {
>          retry = false;
>          QLIST_FOREACH(req, &job->inflight_reqs, list) {
> -            if (end > req->start && start < req->end) {
> +            if (end > req->start_byte && offset < req->end_byte) {
>                  qemu_co_queue_wait(&req->wait_queue, NULL);
>                  retry = true;
>                  break;
> @@ -75,10 +75,10 @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
> 
>  /* Keep track of an in-flight request */
>  static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
> -                                     int64_t start, int64_t end)
> +                              int64_t offset, int64_t end)

Same here.

But again, not a correctness issue.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/21] mirror: Switch mirror_do_read() to byte-based
  2017-07-06 13:30   ` Kevin Wolf
@ 2017-07-06 14:25     ` Eric Blake
  2017-07-06 14:55       ` Kevin Wolf
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-06 14:25 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2611 bytes --]

On 07/06/2017 08:30 AM, Kevin Wolf wrote:
> Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
>> We are gradually converting to byte-based interfaces, as they are
>> easier to reason about than sector-based.  Convert another internal
>> function (no semantic change).
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> Reviewed-by: John Snow <jsnow@redhat.com>
>> Reviewed-by: Jeff Cody <jcody@redhat.com>
> 
>> -    /* The sector range must meet granularity because:
>> +    assert(bytes <= s->buf_size);
>> +    /* The range will be sector-aligned because:
>>       * 1) Caller passes in aligned values;
>> -     * 2) mirror_cow_align is used only when target cluster is larger. */
>> -    assert(!(sector_num % sectors_per_chunk));

So the strict translation would be assert(!(offset % s->granularity)),
or rewritten to be assert(QEMU_IS_ALIGNED(offset, s->granularity)).

>> -    nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
>> +     * 2) mirror_cow_align is used only when target cluster is larger.
>> +     * But it might not be cluster-aligned at end-of-file. */
>> +    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));

and you're right that this appears to be a new assertion.

>> +    nb_chunks = DIV_ROUND_UP(bytes, s->granularity);
> 
> The assertion in the old code was about sector_num (i.e.  that the start
> of the range is cluster-aligned), not about nb_sectors, so while you add
> a new assertion, it is asserting something different. This explains
> why you had to switch to sector aligned even though the semantics
> shouldn't be changed by this patch.
> 
> Is this intentional or did you confuse sector_num and nb_sectors? I
> think we can just have both assertions.

At this point, I'm not sure if I confused things, or if I hit an actual
iotest failure later in the series that I traced back to this point.  If
I have a reason to respin, I'll see if both assertions still hold (the
rewritten alignment check on offset, and the new check on bytes being
sector-aligned), through all the tests.  Also, while asserting that
bytes is sector-aligned is good in the short term, it may be overkill by
the time dirty-bitmap is changed to byte alignment (right now,
bdrv_getlength() is always sector-aligned, because it rounds up; but if
we ever make it byte-accurate, then the trailing cluster might not be a
sector-aligned bytes length).

> 
> The rest of the patch looks okay.
> 
> Kevin
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 17/21] backup: Switch backup_do_cow() to byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 17/21] backup: Switch backup_do_cow() " Eric Blake
@ 2017-07-06 14:36   ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 14:36 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/21] backup: Switch backup_run() to byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 18/21] backup: Switch backup_run() " Eric Blake
@ 2017-07-06 14:43   ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 14:43 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Change the internal
> loop iteration of backups to track by bytes instead of sectors
> (although we are still guaranteed that we iterate by steps that
> are cluster-aligned).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/21] mirror: Switch mirror_do_read() to byte-based
  2017-07-06 14:25     ` Eric Blake
@ 2017-07-06 14:55       ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 14:55 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 3070 bytes --]

Am 06.07.2017 um 16:25 hat Eric Blake geschrieben:
> On 07/06/2017 08:30 AM, Kevin Wolf wrote:
> > Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> >> We are gradually converting to byte-based interfaces, as they are
> >> easier to reason about than sector-based.  Convert another internal
> >> function (no semantic change).
> >>
> >> Signed-off-by: Eric Blake <eblake@redhat.com>
> >> Reviewed-by: John Snow <jsnow@redhat.com>
> >> Reviewed-by: Jeff Cody <jcody@redhat.com>
> > 
> >> -    /* The sector range must meet granularity because:
> >> +    assert(bytes <= s->buf_size);
> >> +    /* The range will be sector-aligned because:
> >>       * 1) Caller passes in aligned values;
> >> -     * 2) mirror_cow_align is used only when target cluster is larger. */
> >> -    assert(!(sector_num % sectors_per_chunk));
> 
> So the strict translation would be assert(!(offset % s->granularity)),
> or rewritten to be assert(QEMU_IS_ALIGNED(offset, s->granularity)).

Right.

> >> -    nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
> >> +     * 2) mirror_cow_align is used only when target cluster is larger.
> >> +     * But it might not be cluster-aligned at end-of-file. */
> >> +    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
> 
> and you're right that this appears to be a new assertion.
> 
> >> +    nb_chunks = DIV_ROUND_UP(bytes, s->granularity);
> > 
> > The assertion in the old code was about sector_num (i.e.  that the start
> > of the range is cluster-aligned), not about nb_sectors, so while you add
> > a new assertion, it is asserting something different. This explains
> > why you had to switch to sector aligned even though the semantics
> > shouldn't be changed by this patch.
> > 
> > Is this intentional or did you confuse sector_num and nb_sectors? I
> > think we can just have both assertions.
> 
> At this point, I'm not sure if I confused things, or if I hit an actual
> iotest failure later in the series that I traced back to this point.

But if you did, wouldn't this indicate a real bug in the patch?

> If I have a reason to respin, I'll see if both assertions still hold
> (the rewritten alignment check on offset, and the new check on bytes
> being sector-aligned), through all the tests.

Actually, I think this one, while seemingly minor, is a reason to
respin. This kind of assertions is exactly for preventing bugs when
doing changes like this patch does, so removing the assertion in such a
patch looks really suspicious.

Also, with the new assertion, the comment doesn't really make that much
sense any more either.

> Also, while asserting that bytes is sector-aligned is good in the
> short term, it may be overkill by the time dirty-bitmap is changed to
> byte alignment (right now, bdrv_getlength() is always sector-aligned,
> because it rounds up; but if we ever make it byte-accurate, then the
> trailing cluster might not be a sector-aligned bytes length).

This is true, but I think for the time being the assertion is worth it.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based Eric Blake
@ 2017-07-06 16:02   ` Kevin Wolf
  2017-07-06 16:24     ` Eric Blake
  2017-07-07  2:55     ` Eric Blake
  0 siblings, 2 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 16:02 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Juan Quintela, Dr. David Alan Gilbert

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the signature of the function to use int64_t *pnum ensures
> that the compiler enforces that all callers are updated.  For now,
> the io.c layer still assert()s that all callers are sector-aligned
> on input and that *pnum is sector-aligned on return to the caller,
> but that can be relaxed when a later patch implements byte-based
> block status.  Therefore, this code adds usages like
> DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned
> values, where the call might reasonbly give non-aligned results
> in the future; on the other hand, no rounding is needed for callers
> that should just continue to work with byte alignment.
> 
> For the most part this patch is just the addition of scaling at the
> callers followed by inverse scaling at bdrv_is_allocated().  But
> some code, particularly bdrv_commit(), gets a lot simpler because it
> no longer has to mess with sectors; also, it is now possible to pass
> NULL if the caller does not care how much of the image is allocated
> beyond the initial offset.
> 
> For ease of review, bdrv_is_allocated_above() will be tackled
> separately.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

> diff --git a/block/io.c b/block/io.c
> index f662c97..cb40069 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1036,14 +1036,15 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
>          int64_t start_sector = offset >> BDRV_SECTOR_BITS;
>          int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
>          unsigned int nb_sectors = end_sector - start_sector;
> -        int pnum;
> +        int64_t pnum;
> 
> -        ret = bdrv_is_allocated(bs, start_sector, nb_sectors, &pnum);
> +        ret = bdrv_is_allocated(bs, start_sector << BDRV_SECTOR_BITS,
> +                                nb_sectors << BDRV_SECTOR_BITS, &pnum);
>          if (ret < 0) {
>              goto out;
>          }
> 
> -        if (!ret || pnum != nb_sectors) {
> +        if (!ret || pnum != nb_sectors << BDRV_SECTOR_BITS) {
>              ret = bdrv_co_do_copy_on_readv(child, offset, bytes, qiov);
>              goto out;
>          }

This code would become a lot simpler if you just removed the local
variables start_sector/end_sector and used the byte offsets instead that
are available in this function. (And even simpler once sector alignment
isn't required any more for bdrv_is_allocated(). Should we leave a
comment here so we won't forget the conversion?)

> @@ -1904,15 +1905,26 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
>                                         sector_num, nb_sectors, pnum, file);
>  }
> 
> -int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
> -                                   int nb_sectors, int *pnum)
> +int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
> +                                   int64_t bytes, int64_t *pnum)
>  {
>      BlockDriverState *file;
> -    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum,
> -                                        &file);
> +    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
> +    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
> +    int64_t ret;
> +    int psectors;
> +
> +    assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
> +    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
> +           bytes < INT_MAX * BDRV_SECTOR_SIZE);

I think we can actually assert bytes <= INT_MAX. Both ways to write the
assertion are only true because of BDRV_REQUEST_MAX_SECTORS, which
ensures that the byte counts fit in an int.

Some of the places that call bdrv_is_allocated() actually depend on this
because your patches tend to use nb_sectors << BDRV_SECTOR_BITS (which
truncates) rather than nb_sectors * BDRV_SECTOR_BYTES (which is always a
64 bit calculation).

> +    ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
> +                                &file);
>      if (ret < 0) {
>          return ret;
>      }
> +    if (pnum) {
> +        *pnum = psectors * BDRV_SECTOR_SIZE;
> +    }
>      return !!(ret & BDRV_BLOCK_ALLOCATED);
>  }

> diff --git a/block/stream.c b/block/stream.c
> index e3dd2ac..e5f2a08 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -137,6 +137,7 @@ static void coroutine_fn stream_run(void *opaque)
> 
>      for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
>          bool copy;
> +        int64_t count = 0;
> 
>          /* Note that even when no rate limit is applied we need to yield
>           * with no pending I/O here so that bdrv_drain_all() returns.
> @@ -148,8 +149,8 @@ static void coroutine_fn stream_run(void *opaque)
> 
>          copy = false;
> 
> -        ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
> -                                STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
> +        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count);
> +        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>          if (ret == 1) {
>              /* Allocated in the top, no need to copy.  */
>          } else if (ret >= 0) {

I'm not sure if I agree with rounding up here. If ret == 0, it could
mean that we skip some data that is actually allocated. So I'd like an
assertion better.

The last patch of the series gets rid of this by switching the whole
function to bytes, so not that important.

> diff --git a/migration/block.c b/migration/block.c
> index 7674ae1..4a48e5c 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -34,7 +34,7 @@
>  #define BLK_MIG_FLAG_PROGRESS           0x04
>  #define BLK_MIG_FLAG_ZERO_BLOCK         0x08
> 
> -#define MAX_IS_ALLOCATED_SEARCH 65536
> +#define MAX_IS_ALLOCATED_SEARCH (65536 * BDRV_SECTOR_SIZE)
> 
>  #define MAX_INFLIGHT_IO 512
> 
> @@ -267,6 +267,7 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>      BlockBackend *bb = bmds->blk;
>      BlkMigBlock *blk;
>      int nr_sectors;
> +    int64_t count;
> 
>      if (bmds->shared_base) {
>          qemu_mutex_lock_iothread();
> @@ -274,9 +275,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>          /* Skip unallocated sectors; intentionally treats failure as
>           * an allocated sector */
>          while (cur_sector < total_sectors &&
> -               !bdrv_is_allocated(blk_bs(bb), cur_sector,
> -                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
> -            cur_sector += nr_sectors;
> +               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
> +                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
> +            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>          }
>          aio_context_release(blk_get_aio_context(bb));
>          qemu_mutex_unlock_iothread();

This one stays and it has the same problem. I think you need to round
down if you don't want to accidentally skip migrating allocated data.

> diff --git a/qemu-img.c b/qemu-img.c
> index d5a1560..5271b41 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -3229,6 +3229,7 @@ static int img_rebase(int argc, char **argv)
>          int64_t new_backing_num_sectors = 0;
>          uint64_t sector;
>          int n;
> +        int64_t count;
>          float local_progress = 0;
> 
>          buf_old = blk_blockalign(blk, IO_BUF_SIZE);
> @@ -3276,12 +3277,14 @@ static int img_rebase(int argc, char **argv)
>              }
> 
>              /* If the cluster is allocated, we don't need to take action */
> -            ret = bdrv_is_allocated(bs, sector, n, &n);
> +            ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS,
> +                                    n << BDRV_SECTOR_BITS, &count);
>              if (ret < 0) {
>                  error_report("error while reading image metadata: %s",
>                               strerror(-ret));
>                  goto out;
>              }
> +            n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>              if (ret) {
>                  continue;
>              }

Same thing. Sectors that are half allocated and half free must be
considered completely allocated if the caller can't do byte alignment.
Just rounding up without looking at ret isn't correct.

> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index b0ea327..e529b8f 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1894,15 +1889,14 @@ static int map_f(BlockBackend *blk, int argc, char **argv)
>          }
> 
>          retstr = ret ? "    allocated" : "not allocated";
> -        cvtstr(num << BDRV_SECTOR_BITS, s1, sizeof(s1));
> -        cvtstr(offset << BDRV_SECTOR_BITS, s2, sizeof(s2));
> +        cvtstr(num, s1, sizeof(s1));
> +        cvtstr(offset, s2, sizeof(s2));
>          printf("%s (0x%" PRIx64 ") bytes %s at offset %s (0x%" PRIx64 ")\n",
> -               s1, num << BDRV_SECTOR_BITS, retstr,
> -               s2, offset << BDRV_SECTOR_BITS);
> +               s1, num, retstr, s2, offset);
> 
>          offset += num;
> -        nb_sectors -= num;
> -    } while (offset < total_sectors);
> +        bytes -= num;
> +    };

This semicolon isn't needed.

Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based
  2017-07-06 16:02   ` Kevin Wolf
@ 2017-07-06 16:24     ` Eric Blake
  2017-07-07  2:55     ` Eric Blake
  1 sibling, 0 replies; 46+ messages in thread
From: Eric Blake @ 2017-07-06 16:24 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Juan Quintela, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 10059 bytes --]

On 07/06/2017 11:02 AM, Kevin Wolf wrote:
> Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
>> We are gradually moving away from sector-based interfaces, towards
>> byte-based.  In the common case, allocation is unlikely to ever use
>> values that are not naturally sector-aligned, but it is possible
>> that byte-based values will let us be more precise about allocation
>> at the end of an unaligned file that can do byte-based access.
>>
>> Changing the signature of the function to use int64_t *pnum ensures
>> that the compiler enforces that all callers are updated.  For now,
>> the io.c layer still assert()s that all callers are sector-aligned
>> on input and that *pnum is sector-aligned on return to the caller,
>> but that can be relaxed when a later patch implements byte-based
>> block status.  Therefore, this code adds usages like
>> DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned
>> values, where the call might reasonbly give non-aligned results
>> in the future; on the other hand, no rounding is needed for callers
>> that should just continue to work with byte alignment.
>>
>> For the most part this patch is just the addition of scaling at the
>> callers followed by inverse scaling at bdrv_is_allocated().  But
>> some code, particularly bdrv_commit(), gets a lot simpler because it
>> no longer has to mess with sectors; also, it is now possible to pass
>> NULL if the caller does not care how much of the image is allocated
>> beyond the initial offset.
>>
>> For ease of review, bdrv_is_allocated_above() will be tackled
>> separately.
>>

>> @@ -1036,14 +1036,15 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,

>>
>> -        if (!ret || pnum != nb_sectors) {
>> +        if (!ret || pnum != nb_sectors << BDRV_SECTOR_BITS) {
>>              ret = bdrv_co_do_copy_on_readv(child, offset, bytes, qiov);
>>              goto out;
>>          }
> 
> This code would become a lot simpler if you just removed the local
> variables start_sector/end_sector and used the byte offsets instead that
> are available in this function. (And even simpler once sector alignment
> isn't required any more for bdrv_is_allocated(). Should we leave a
> comment here so we won't forget the conversion?)

Hmm - that cleanup is not (yet) done in my bdrv_get_block_status series
(where I finally drop the need for sector alignment in
bdrv_is_allocated).  Yes, I can add a temporary comment here (that aids
with the understanding of intermediate states), as well as a new
followup patch that actually does the cleanup.

>> -int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
>> -                                   int nb_sectors, int *pnum)
>> +int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>> +                                   int64_t bytes, int64_t *pnum)
>>  {
>>      BlockDriverState *file;
>> -    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum,
>> -                                        &file);
>> +    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
>> +    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
>> +    int64_t ret;
>> +    int psectors;
>> +
>> +    assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
>> +    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
>> +           bytes < INT_MAX * BDRV_SECTOR_SIZE);
> 
> I think we can actually assert bytes <= INT_MAX. Both ways to write the
> assertion are only true because of BDRV_REQUEST_MAX_SECTORS, which
> ensures that the byte counts fit in an int.

That's a tighter assertion, but if it passes, I'm fine using it.

> 
> Some of the places that call bdrv_is_allocated() actually depend on this
> because your patches tend to use nb_sectors << BDRV_SECTOR_BITS (which
> truncates) rather than nb_sectors * BDRV_SECTOR_BYTES (which is always a
> 64 bit calculation).

Actually, for this series of cleanups, I tried really hard to use *
_BYTES everywhere that I widen sectors to bytes, and limit myself to >>
_BITS for going from bytes back to sectors (after also checking if I
needed DIV_ROUND_UP instead of straight division).  If you spot places
where I'm still doing a 32-bit << _BITS, please call it out (as I do
remember that I caused bugs that way back when I converted pdiscard to
byte-based, back near commit d2cb36af).


>> +++ b/block/stream.c
>> @@ -137,6 +137,7 @@ static void coroutine_fn stream_run(void *opaque)
>>
>>      for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
>>          bool copy;
>> +        int64_t count = 0;
>>
>>          /* Note that even when no rate limit is applied we need to yield
>>           * with no pending I/O here so that bdrv_drain_all() returns.
>> @@ -148,8 +149,8 @@ static void coroutine_fn stream_run(void *opaque)
>>
>>          copy = false;
>>
>> -        ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
>> -                                STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
>> +        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count);
>> +        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>>          if (ret == 1) {
>>              /* Allocated in the top, no need to copy.  */
>>          } else if (ret >= 0) {
> 
> I'm not sure if I agree with rounding up here. If ret == 0, it could
> mean that we skip some data that is actually allocated. So I'd like an
> assertion better.
> 
> The last patch of the series gets rid of this by switching the whole
> function to bytes, so not that important.

A temporary assertion is fine (and you're right that the assertion goes
away once the function uses bytes everywhere).

Sounds like we're racking up enough cleanups that I'll spin a v5, but
also sounds like we're real close to getting this in.

>> @@ -274,9 +275,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>>          /* Skip unallocated sectors; intentionally treats failure as
>>           * an allocated sector */
>>          while (cur_sector < total_sectors &&
>> -               !bdrv_is_allocated(blk_bs(bb), cur_sector,
>> -                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
>> -            cur_sector += nr_sectors;
>> +               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
>> +                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
>> +            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>>          }
>>          aio_context_release(blk_get_aio_context(bb));
>>          qemu_mutex_unlock_iothread();
> 
> This one stays and it has the same problem. I think you need to round
> down if you don't want to accidentally skip migrating allocated data.

I haven't (yet) tackled trying to rewrite this one to do byte-base
iteration, but that may help.  I'm also worried (in general) that if
something truncates down a sub-sector result in code that is still
sector-based, then we could get stuck in an inf-loop of querying the
same sector offset over and over again instead of making progress.

But for this particular code, I see what you are saying - we are trying
to optimize the migration by skipping unallocated sectors, but if there
is nothing to skip, we still make migration progress as long as I
remember to break from the loop if I ever get a sub-sector answer (thus
migrating the whole sector, including the [unlikely] initial unallocated
portion).  Okay, I'll fix it for v5.

> 
>> diff --git a/qemu-img.c b/qemu-img.c
>> index d5a1560..5271b41 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -3229,6 +3229,7 @@ static int img_rebase(int argc, char **argv)
>>          int64_t new_backing_num_sectors = 0;
>>          uint64_t sector;
>>          int n;
>> +        int64_t count;
>>          float local_progress = 0;
>>
>>          buf_old = blk_blockalign(blk, IO_BUF_SIZE);
>> @@ -3276,12 +3277,14 @@ static int img_rebase(int argc, char **argv)
>>              }
>>
>>              /* If the cluster is allocated, we don't need to take action */
>> -            ret = bdrv_is_allocated(bs, sector, n, &n);
>> +            ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS,
>> +                                    n << BDRV_SECTOR_BITS, &count);
>>              if (ret < 0) {
>>                  error_report("error while reading image metadata: %s",
>>                               strerror(-ret));
>>                  goto out;
>>              }
>> +            n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>>              if (ret) {
>>                  continue;
>>              }
> 
> Same thing. Sectors that are half allocated and half free must be
> considered completely allocated if the caller can't do byte alignment.
> Just rounding up without looking at ret isn't correct.
> 
>> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
>> index b0ea327..e529b8f 100644
>> --- a/qemu-io-cmds.c
>> +++ b/qemu-io-cmds.c
>> @@ -1894,15 +1889,14 @@ static int map_f(BlockBackend *blk, int argc, char **argv)
>>          }
>>
>>          retstr = ret ? "    allocated" : "not allocated";
>> -        cvtstr(num << BDRV_SECTOR_BITS, s1, sizeof(s1));
>> -        cvtstr(offset << BDRV_SECTOR_BITS, s2, sizeof(s2));
>> +        cvtstr(num, s1, sizeof(s1));
>> +        cvtstr(offset, s2, sizeof(s2));
>>          printf("%s (0x%" PRIx64 ") bytes %s at offset %s (0x%" PRIx64 ")\n",
>> -               s1, num << BDRV_SECTOR_BITS, retstr,
>> -               s2, offset << BDRV_SECTOR_BITS);
>> +               s1, num, retstr, s2, offset);
>>
>>          offset += num;
>> -        nb_sectors -= num;
>> -    } while (offset < total_sectors);
>> +        bytes -= num;
>> +    };
> 
> This semicolon isn't needed.

Indeed. I converted do/while to plain while, but not quite correctly ;)


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors Eric Blake
  2017-07-06  0:23   ` John Snow
@ 2017-07-06 16:48   ` Kevin Wolf
  2017-07-06 17:03     ` Eric Blake
  1 sibling, 1 reply; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 16:48 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, jsnow, jcody, qemu-block, Stefan Hajnoczi, Fam Zheng,
	Max Reitz

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> bdrv_is_allocated_above() was relying on intermediate->total_sectors,
> which is a field that can have stale contents depending on the value
> of intermediate->has_variable_length.  An audit shows that we are safe
> (we were first calling through bdrv_co_get_block_status() which in
> turn calls bdrv_nb_sectors() and therefore just refreshed the current
> length), but it's nicer to favor our accessor functions to avoid having
> to repeat such an audit, even if it means refresh_total_sectors() is
> called more frequently.
> 
> Suggested-by: John Snow <jsnow@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
> Reviewed-by: Jeff Cody <jcody@redhat.com>
> 
> ---
> v3-v4: no change
> v2: new patch
> ---
>  block/io.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index cb40069..fb8d1c7 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1952,6 +1952,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>      intermediate = top;
>      while (intermediate && intermediate != base) {
>          int64_t pnum_inter;
> +        int64_t size_inter;
>          int psectors_inter;
> 
>          ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
> @@ -1969,13 +1970,14 @@ int bdrv_is_allocated_above(BlockDriverState *top,
> 
>          /*
>           * [sector_num, nb_sectors] is unallocated on top but intermediate
> -         * might have
> -         *
> -         * [sector_num+x, nr_sectors] allocated.
> +         * might have [sector_num+x, nb_sectors-x] allocated.
>           */

I tried to figure out what this comment wants to tell us, and neither
the original version nor your changed one seemed to make a lot of sense
to me. The only case that I can see that actually needs the following
block is a case like this:

    (. = unallocated, # = allocated)

    top             ....####
    intermediate    ........

Our initial request was for 8 sectors, but when going to the
intermediate node, we need to reduce this to 4 sectors, otherwise we
would return unallocated for sectors 5 to 8 even though they are
allocated in top.

That's kind of the opposite of what the comment says, though...

> +        size_inter = bdrv_nb_sectors(intermediate);
> +        if (size_inter < 0) {
> +            return size_inter;
> +        }
>          if (n > psectors_inter &&
> -            (intermediate == top ||
> -             sector_num + psectors_inter < intermediate->total_sectors)) {
> +            (intermediate == top || sector_num + psectors_inter < size_inter)) {
>              n = psectors_inter;
>          }

The actual code change looks good.

Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors
  2017-07-06 16:48   ` Kevin Wolf
@ 2017-07-06 17:03     ` Eric Blake
  2017-07-06 17:27       ` Kevin Wolf
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-06 17:03 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, jsnow, jcody, qemu-block, Stefan Hajnoczi, Fam Zheng,
	Max Reitz

[-- Attachment #1: Type: text/plain, Size: 3361 bytes --]

On 07/06/2017 11:48 AM, Kevin Wolf wrote:
> Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
>> bdrv_is_allocated_above() was relying on intermediate->total_sectors,
>> which is a field that can have stale contents depending on the value
>> of intermediate->has_variable_length.  An audit shows that we are safe
>> (we were first calling through bdrv_co_get_block_status() which in
>> turn calls bdrv_nb_sectors() and therefore just refreshed the current
>> length), but it's nicer to favor our accessor functions to avoid having
>> to repeat such an audit, even if it means refresh_total_sectors() is
>> called more frequently.
>>
>> @@ -1969,13 +1970,14 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>>
>>          /*
>>           * [sector_num, nb_sectors] is unallocated on top but intermediate
>> -         * might have
>> -         *
>> -         * [sector_num+x, nr_sectors] allocated.
>> +         * might have [sector_num+x, nb_sectors-x] allocated.
>>           */
> 
> I tried to figure out what this comment wants to tell us, and neither
> the original version nor your changed one seemed to make a lot of sense
> to me.

The original one was definitely weird. I'm fine with bike-shedding on my
replacement (including deleting the comment altogether).

> The only case that I can see that actually needs the following
> block is a case like this:
> 
>     (. = unallocated, # = allocated)
> 
>     top             ....####
>     intermediate    ........
> 
> Our initial request was for 8 sectors, but when going to the
> intermediate node, we need to reduce this to 4 sectors, otherwise we
> would return unallocated for sectors 5 to 8 even though they are
> allocated in top.
> 
> That's kind of the opposite of what the comment says, though...

I think the comment was trying to worry about:

    top             ........
    intermediate    ....####

then our first query says we have 8 sectors unallocated, but we can't
return 8 unless we also check 8 sectors of the intermediate image
(because the intermediate image may have sectors 4-7 allocated).  So it
is explaining why the for loop is continuing down the backing chain.

If we ever get an answer of allocated, we can return immediately; we
only have to keep looking (on potentially smaller prefixes) as long as
we have an answer of unallocated and more backing chain to still check.

On the other hand, there's this situation:

top           ....####
intermediate  ####....

Right now (whether or not you apply my patch), the code only returns a
pnum of 4 sectors allocated (the first query on top sees 4 unallocated,
the second query on intermediate sees 4 allocated).  But in reality, we
COULD return a pnum of 8 (between the two images being queries, we can
account for an allocation of 8 sectors), although doing so requires
additional bookkeeping and calls into the drivers (get_status(top, 0, 8)
returns pnum of 4, get_status(intermediate, 0, 4) returns pnum of 4,
then get_status(top, 4, 4) returns pnum of 4, so that the overall
is_allocated(top, intermediate, 0, 8) sets pnum of 8).  Maybe it's worth
doing, but not in the same series that converts to byte-based.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 21/21] block: Make bdrv_is_allocated_above() byte-based
  2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 21/21] block: Make bdrv_is_allocated_above() byte-based Eric Blake
@ 2017-07-06 17:13   ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 17:13 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Wen Congyang, Xie Changlong

Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the signature of the function to use int64_t *pnum ensures
> that the compiler enforces that all callers are updated.  For now,
> the io.c layer still assert()s that all callers are sector-aligned,
> but that can be relaxed when a later patch implements byte-based
> block status.  Therefore, for the most part this patch is just the
> addition of scaling at the callers followed by inverse scaling at
> bdrv_is_allocated().  But some code, particularly stream_run(),
> gets a lot simpler because it no longer has to mess with sectors.
> 
> For ease of review, bdrv_is_allocated() was tackled separately.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Xie Changlong <xiechanglong@cmss.chinamobile.com> [replication part]
> Reviewed-by: Jeff Cody <jcody@redhat.com>
> 
> ---
> v3-v4: no change
> v2: tweak function comments, favor bdrv_getlength() over ->total_sectors
> ---
>  include/block/block.h |  2 +-
>  block/commit.c        | 20 ++++++++------------
>  block/io.c            | 42 ++++++++++++++++++++----------------------
>  block/mirror.c        |  5 ++++-
>  block/replication.c   | 17 ++++++++++++-----
>  block/stream.c        | 21 +++++++++------------
>  qemu-img.c            | 10 +++++++---
>  7 files changed, 61 insertions(+), 56 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index d3e01fb..f0fdbe8 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -430,7 +430,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>  int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
>                        int64_t *pnum);
>  int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
> -                            int64_t sector_num, int nb_sectors, int *pnum);
> +                            int64_t offset, int64_t bytes, int64_t *pnum);
> 
>  bool bdrv_is_read_only(BlockDriverState *bs);
>  bool bdrv_is_writable(BlockDriverState *bs);
> diff --git a/block/commit.c b/block/commit.c
> index 241aa95..774a8a5 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -146,7 +146,7 @@ static void coroutine_fn commit_run(void *opaque)
>      int64_t offset;
>      uint64_t delay_ns = 0;
>      int ret = 0;
> -    int n = 0; /* sectors */
> +    int64_t n = 0; /* bytes */
>      void *buf = NULL;
>      int bytes_written = 0;
>      int64_t base_len;
> @@ -171,7 +171,7 @@ static void coroutine_fn commit_run(void *opaque)
> 
>      buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
> 
> -    for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
> +    for (offset = 0; offset < s->common.len; offset += n) {
>          bool copy;
> 
>          /* Note that even when no rate limit is applied we need to yield
> @@ -183,15 +183,12 @@ static void coroutine_fn commit_run(void *opaque)
>          }
>          /* Copy if allocated above the base */
>          ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
> -                                      offset / BDRV_SECTOR_SIZE,
> -                                      COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
> -                                      &n);
> +                                      offset, COMMIT_BUFFER_SIZE, &n);
>          copy = (ret == 1);
> -        trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
> +        trace_commit_one_iteration(s, offset, n, ret);
>          if (copy) {
> -            ret = commit_populate(s->top, s->base, offset,
> -                                  n * BDRV_SECTOR_SIZE, buf);
> -            bytes_written += n * BDRV_SECTOR_SIZE;
> +            ret = commit_populate(s->top, s->base, offset, n, buf);
> +            bytes_written += n;
>          }
>          if (ret < 0) {
>              BlockErrorAction action =
> @@ -204,11 +201,10 @@ static void coroutine_fn commit_run(void *opaque)
>              }
>          }
>          /* Publish progress */
> -        s->common.offset += n * BDRV_SECTOR_SIZE;
> +        s->common.offset += n;
> 
>          if (copy && s->common.speed) {
> -            delay_ns = ratelimit_calculate_delay(&s->limit,
> -                                                 n * BDRV_SECTOR_SIZE);
> +            delay_ns = ratelimit_calculate_delay(&s->limit, n);
>          }
>      }
> 
> diff --git a/block/io.c b/block/io.c
> index fb8d1c7..569c503 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1931,54 +1931,52 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>  /*
>   * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
>   *
> - * Return true if the given sector is allocated in any image between
> - * BASE and TOP (inclusive).  BASE can be NULL to check if the given
> - * sector is allocated in any image of the chain.  Return false otherwise,
> + * Return true if the (prefix of the) given range is allocated in any image

(a prefix of) the given range

> + * between BASE and TOP (inclusive).  BASE can be NULL to check if the given
> + * offset is allocated in any image of the chain.  Return false otherwise,
>   * or negative errno on failure.
>   *
> - * 'pnum' is set to the number of sectors (including and immediately following
> - *  the specified sector) that are known to be in the same
> - *  allocated/unallocated state.
> + * 'pnum' is set to the number of bytes (including and immediately
> + * following the specified offset) that are known to be in the same
> + * allocated/unallocated state.  Note that a subsequent call starting
> + * at 'offset + *pnum' may return the same allocation status (in other
> + * words, the result is not necessarily the maximum possible range);
> + * but 'pnum' will only be 0 when end of file is reached.
>   *
>   */
>  int bdrv_is_allocated_above(BlockDriverState *top,
>                              BlockDriverState *base,
> -                            int64_t sector_num,
> -                            int nb_sectors, int *pnum)
> +                            int64_t offset, int64_t bytes, int64_t *pnum)
>  {
>      BlockDriverState *intermediate;
> -    int ret, n = nb_sectors;
> +    int ret;
> +    int64_t n = bytes;
> 
>      intermediate = top;
>      while (intermediate && intermediate != base) {
>          int64_t pnum_inter;
>          int64_t size_inter;
> -        int psectors_inter;
> 
> -        ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
> -                                nb_sectors * BDRV_SECTOR_SIZE,
> -                                &pnum_inter);
> +        ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
>          if (ret < 0) {
>              return ret;
>          }
> -        assert(pnum_inter < INT_MAX * BDRV_SECTOR_SIZE);
> -        psectors_inter = pnum_inter >> BDRV_SECTOR_BITS;
>          if (ret) {
> -            *pnum = psectors_inter;
> +            *pnum = pnum_inter;
>              return 1;
>          }
> 
>          /*
> -         * [sector_num, nb_sectors] is unallocated on top but intermediate
> -         * might have [sector_num+x, nb_sectors-x] allocated.
> +         * [offset, bytes] is unallocated on top but intermediate
> +         * might have [offset+x, bytes-x] allocated.
>           */

The comment still doesn't make sense. It already starts with the fact
that [offset, pnum_inter] is the unallocated range, not [offset, bytes],
and doesn't end with offset + x never actually being looked at.

> -        size_inter = bdrv_nb_sectors(intermediate);
> +        size_inter = bdrv_getlength(intermediate);
>          if (size_inter < 0) {
>              return size_inter;
>          }
> -        if (n > psectors_inter &&
> -            (intermediate == top || sector_num + psectors_inter < size_inter)) {
> -            n = psectors_inter;
> +        if (n > pnum_inter &&
> +            (intermediate == top || offset + pnum_inter < size_inter)) {
> +            n = pnum_inter;
>          }
> 
>          intermediate = backing_bs(intermediate);
> diff --git a/block/mirror.c b/block/mirror.c
> index f54a8d7..c717f60 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -621,6 +621,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
>      BlockDriverState *bs = s->source;
>      BlockDriverState *target_bs = blk_bs(s->target);
>      int ret, n;
> +    int64_t count;
> 
>      end = s->bdev_length / BDRV_SECTOR_SIZE;
> 
> @@ -670,11 +671,13 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
>              return 0;
>          }
> 
> -        ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, &n);
> +        ret = bdrv_is_allocated_above(bs, base, sector_num * BDRV_SECTOR_SIZE,
> +                                      nb_sectors * BDRV_SECTOR_SIZE, &count);
>          if (ret < 0) {
>              return ret;
>          }
> 
> +        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);

The usual thing again. Partially allocated sectors need to be considered
fully allocated rather than using whatever the status of the first part
is.

>          assert(n > 0);
>          if (ret == 1) {
>              bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
>
> diff --git a/qemu-img.c b/qemu-img.c
> index 5271b41..960f42a 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1477,12 +1477,16 @@ static int img_compare(int argc, char **argv)
>          }
> 
>          for (;;) {
> +            int64_t count;
> +
>              nb_sectors = sectors_to_process(total_sectors_over, sector_num);
>              if (nb_sectors <= 0) {
>                  break;
>              }
> -            ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL, sector_num,
> -                                          nb_sectors, &pnum);
> +            ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL,
> +                                          sector_num * BDRV_SECTOR_SIZE,
> +                                          nb_sectors * BDRV_SECTOR_SIZE,
> +                                          &count);
>              if (ret < 0) {
>                  ret = 3;
>                  error_report("Sector allocation test failed for %s",
> @@ -1490,7 +1494,7 @@ static int img_compare(int argc, char **argv)
>                  goto out;
> 
>              }
> -            nb_sectors = pnum;
> +            nb_sectors = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>              if (ret) {
>                  ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
>                                            filename_over, buf1, quiet);

And one final instance of the same bug.

Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors
  2017-07-06 17:03     ` Eric Blake
@ 2017-07-06 17:27       ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-06 17:27 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, jsnow, jcody, qemu-block, Stefan Hajnoczi, Fam Zheng,
	Max Reitz

[-- Attachment #1: Type: text/plain, Size: 4107 bytes --]

Am 06.07.2017 um 19:03 hat Eric Blake geschrieben:
> On 07/06/2017 11:48 AM, Kevin Wolf wrote:
> > Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
> >> bdrv_is_allocated_above() was relying on intermediate->total_sectors,
> >> which is a field that can have stale contents depending on the value
> >> of intermediate->has_variable_length.  An audit shows that we are safe
> >> (we were first calling through bdrv_co_get_block_status() which in
> >> turn calls bdrv_nb_sectors() and therefore just refreshed the current
> >> length), but it's nicer to favor our accessor functions to avoid having
> >> to repeat such an audit, even if it means refresh_total_sectors() is
> >> called more frequently.
> >>
> >> @@ -1969,13 +1970,14 @@ int bdrv_is_allocated_above(BlockDriverState *top,
> >>
> >>          /*
> >>           * [sector_num, nb_sectors] is unallocated on top but intermediate
> >> -         * might have
> >> -         *
> >> -         * [sector_num+x, nr_sectors] allocated.
> >> +         * might have [sector_num+x, nb_sectors-x] allocated.
> >>           */
> > 
> > I tried to figure out what this comment wants to tell us, and neither
> > the original version nor your changed one seemed to make a lot of sense
> > to me.
> 
> The original one was definitely weird. I'm fine with bike-shedding on my
> replacement (including deleting the comment altogether).
> 
> > The only case that I can see that actually needs the following
> > block is a case like this:
> > 
> >     (. = unallocated, # = allocated)
> > 
> >     top             ....####
> >     intermediate    ........
> > 
> > Our initial request was for 8 sectors, but when going to the
> > intermediate node, we need to reduce this to 4 sectors, otherwise we
> > would return unallocated for sectors 5 to 8 even though they are
> > allocated in top.
> > 
> > That's kind of the opposite of what the comment says, though...
> 
> I think the comment was trying to worry about:
> 
>     top             ........
>     intermediate    ....####
> 
> then our first query says we have 8 sectors unallocated, but we can't
> return 8 unless we also check 8 sectors of the intermediate image
> (because the intermediate image may have sectors 4-7 allocated).  So it
> is explaining why the for loop is continuing down the backing chain.

Oh, that might be it. It just looked weird because it was directly above
the code that reduces n, so I thought it was related to that. (It looks
even more like that in the original commit c8c3080f.)

Maybe add "so we have to look at the next backing file" or something to
make the context clearer?

Of course, the exact values still aren't correct. [sector_num,
psectors_inter] is the unallocated part in top, and the allocated area
in the intermediate image can start anywhere, including at sector_num,
and have any length <= nb_sectors - x.

> If we ever get an answer of allocated, we can return immediately; we
> only have to keep looking (on potentially smaller prefixes) as long as
> we have an answer of unallocated and more backing chain to still check.
> 
> On the other hand, there's this situation:
> 
> top           ....####
> intermediate  ####....
> 
> Right now (whether or not you apply my patch), the code only returns a
> pnum of 4 sectors allocated (the first query on top sees 4 unallocated,
> the second query on intermediate sees 4 allocated).  But in reality, we
> COULD return a pnum of 8 (between the two images being queries, we can
> account for an allocation of 8 sectors), although doing so requires
> additional bookkeeping and calls into the drivers (get_status(top, 0, 8)
> returns pnum of 4, get_status(intermediate, 0, 4) returns pnum of 4,
> then get_status(top, 4, 4) returns pnum of 4, so that the overall
> is_allocated(top, intermediate, 0, 8) sets pnum of 8).  Maybe it's worth
> doing, but not in the same series that converts to byte-based.

Callers will generally loop anyway, so it's probably useless overhead to
check the first non-matching block twice.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based
  2017-07-06 16:02   ` Kevin Wolf
  2017-07-06 16:24     ` Eric Blake
@ 2017-07-07  2:55     ` Eric Blake
  2017-07-07  9:25       ` Kevin Wolf
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Blake @ 2017-07-07  2:55 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Juan Quintela, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 2333 bytes --]

On 07/06/2017 11:02 AM, Kevin Wolf wrote:

>> +++ b/qemu-img.c
>> @@ -3229,6 +3229,7 @@ static int img_rebase(int argc, char **argv)
>>          int64_t new_backing_num_sectors = 0;
>>          uint64_t sector;
>>          int n;
>> +        int64_t count;
>>          float local_progress = 0;
>>
>>          buf_old = blk_blockalign(blk, IO_BUF_SIZE);
>> @@ -3276,12 +3277,14 @@ static int img_rebase(int argc, char **argv)
>>              }
>>
>>              /* If the cluster is allocated, we don't need to take action */
>> -            ret = bdrv_is_allocated(bs, sector, n, &n);
>> +            ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS,
>> +                                    n << BDRV_SECTOR_BITS, &count);
>>              if (ret < 0) {
>>                  error_report("error while reading image metadata: %s",
>>                               strerror(-ret));
>>                  goto out;
>>              }
>> +            n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>>              if (ret) {
>>                  continue;
>>              }
> 
> Same thing. Sectors that are half allocated and half free must be
> considered completely allocated if the caller can't do byte alignment.
> Just rounding up without looking at ret isn't correct.

In reality, qemu-img rebase should probably be rewritten to exploit
bdrv_get_block_status() instead of the weaker bdrv_is_allocated(). Right
now, it is manually calling buffer_is_zero() on every sector to learn
what sectors are candidates for optimizing, rather than being able to
exploit BDRV_BLOCK_ZERO where possible (yes, we may STILL want to call
buffer_is_zero() on BDRV_BLOCK_DATA when we are trying to squeeze out
all parts of an image that can be compressed, but maybe that should be
an option, just as GNU dd has options for creating as much sparseness as
possible (slow, because it reads data) vs. preserving existing
sparseness (faster, because it relies on hole detection but leaves
explicit written zeroes as non-sparse).  I guess there's still lots of
ground for improving qemu-img, so for THIS series, I'll just leave it at
an assertion that things are sector-aligned.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based
  2017-07-07  2:55     ` Eric Blake
@ 2017-07-07  9:25       ` Kevin Wolf
  0 siblings, 0 replies; 46+ messages in thread
From: Kevin Wolf @ 2017-07-07  9:25 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, jsnow, jcody, qemu-block, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Juan Quintela, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 3323 bytes --]

Am 07.07.2017 um 04:55 hat Eric Blake geschrieben:
> On 07/06/2017 11:02 AM, Kevin Wolf wrote:
> 
> >> +++ b/qemu-img.c
> >> @@ -3229,6 +3229,7 @@ static int img_rebase(int argc, char **argv)
> >>          int64_t new_backing_num_sectors = 0;
> >>          uint64_t sector;
> >>          int n;
> >> +        int64_t count;
> >>          float local_progress = 0;
> >>
> >>          buf_old = blk_blockalign(blk, IO_BUF_SIZE);
> >> @@ -3276,12 +3277,14 @@ static int img_rebase(int argc, char **argv)
> >>              }
> >>
> >>              /* If the cluster is allocated, we don't need to take action */
> >> -            ret = bdrv_is_allocated(bs, sector, n, &n);
> >> +            ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS,
> >> +                                    n << BDRV_SECTOR_BITS, &count);
> >>              if (ret < 0) {
> >>                  error_report("error while reading image metadata: %s",
> >>                               strerror(-ret));
> >>                  goto out;
> >>              }
> >> +            n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
> >>              if (ret) {
> >>                  continue;
> >>              }
> > 
> > Same thing. Sectors that are half allocated and half free must be
> > considered completely allocated if the caller can't do byte alignment.
> > Just rounding up without looking at ret isn't correct.
> 
> In reality, qemu-img rebase should probably be rewritten to exploit
> bdrv_get_block_status() instead of the weaker bdrv_is_allocated(). Right
> now, it is manually calling buffer_is_zero() on every sector to learn
> what sectors are candidates for optimizing, rather than being able to
> exploit BDRV_BLOCK_ZERO where possible

Yes, that's true, but also unrelated. As soon as you convert
bdrv_get_block_status() to byte granularity, you get the same thing
again.

Actually, having another look at this code, the solution for this
specific case is to convert the whole loop (or even function) to byte
granularity. The bdrv_is_allocated() call was the only thing that
required sectors.

> (yes, we may STILL want to call buffer_is_zero() on BDRV_BLOCK_DATA
> when we are trying to squeeze out all parts of an image that can be
> compressed, but maybe that should be an option, just as GNU dd has
> options for creating as much sparseness as possible (slow, because it
> reads data) vs. preserving existing sparseness (faster, because it
> relies on hole detection but leaves explicit written zeroes as
> non-sparse).

I'd expect that typically buffer_is_zero() is only slow if it's also
effective. Most data blocks start with non-zero data in the first few
bytes, so that's really quick. And I'm sure that checking whether a
buffer is zero is faster than doing disk I/O for the same buffer, so the
case of actual zero buffers should benefit, too. The only problem is
with cases where the buffer starts with many zeros and then ends in some
real data. I'm not sure how common this is.

> I guess there's still lots of ground for improving qemu-img, so for
> THIS series, I'll just leave it at an assertion that things are
> sector-aligned.

Sounds okay. If you want, you can add another patch to convert the
function to bytes, or we can do it later on top.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2017-07-07  9:25 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-05 21:08 [Qemu-devel] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based Eric Blake
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 01/21] blockjob: Track job ratelimits via bytes, not sectors Eric Blake
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 02/21] trace: Show blockjob actions " Eric Blake
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 03/21] stream: Switch stream_populate() to byte-based Eric Blake
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete() Eric Blake
2017-07-06  0:05   ` John Snow
2017-07-06 10:38   ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 05/21] stream: Switch stream_run() to byte-based Eric Blake
2017-07-06 10:39   ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 06/21] commit: Switch commit_populate() " Eric Blake
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 07/21] commit: Switch commit_run() " Eric Blake
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob " Eric Blake
2017-07-06  0:14   ` John Snow
2017-07-06 10:42   ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 09/21] mirror: Switch mirror_do_zero_or_discard() " Eric Blake
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 10/21] mirror: Update signature of mirror_clip_sectors() Eric Blake
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 11/21] mirror: Switch mirror_cow_align() to byte-based Eric Blake
2017-07-06 11:16   ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 12/21] mirror: Switch mirror_do_read() " Eric Blake
2017-07-06 13:30   ` Kevin Wolf
2017-07-06 14:25     ` Eric Blake
2017-07-06 14:55       ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 13/21] mirror: Switch mirror_iteration() " Eric Blake
2017-07-06 13:47   ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 14/21] block: Drop unused bdrv_round_sectors_to_clusters() Eric Blake
2017-07-06 13:49   ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 15/21] backup: Switch BackupBlockJob to byte-based Eric Blake
2017-07-06 13:59   ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 16/21] backup: Switch block_backup.h " Eric Blake
2017-07-06 14:11   ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 17/21] backup: Switch backup_do_cow() " Eric Blake
2017-07-06 14:36   ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 18/21] backup: Switch backup_run() " Eric Blake
2017-07-06 14:43   ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based Eric Blake
2017-07-06 16:02   ` Kevin Wolf
2017-07-06 16:24     ` Eric Blake
2017-07-07  2:55     ` Eric Blake
2017-07-07  9:25       ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors Eric Blake
2017-07-06  0:23   ` John Snow
2017-07-06 16:48   ` Kevin Wolf
2017-07-06 17:03     ` Eric Blake
2017-07-06 17:27       ` Kevin Wolf
2017-07-05 21:08 ` [Qemu-devel] [PATCH v4 21/21] block: Make bdrv_is_allocated_above() byte-based Eric Blake
2017-07-06 17:13   ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.