All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based
@ 2017-04-11 22:29 Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 01/17] blockjob: Track job ratelimits via bytes, not sectors Eric Blake
                   ` (17 more replies)
  0 siblings, 18 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

This is part one of that conversion: bdrv_is_allocated().
Other parts (still to be written) include tracking dirty bitmaps
by bytes (it's still one bit per granularity, but now we won't
be double-scaling from bytes to sectors to granularity), then
replacing bdrv_get_block_status() with a byte based callback
in all the drivers.

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-allocated-v1

It requires v9 or later of my prior work on blkdebug:
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01723.html
which in turn requires Max's block-next tree:
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01298.html

Eric Blake (17):
  blockjob: Track job ratelimits via bytes, not sectors
  trace: Show blockjob actions via bytes, not sectors
  stream: Switch stream_populate() to byte-based
  stream: Switch stream_run() to byte-based
  commit: Switch commit_populate() to byte-based
  commit: Switch commit_run() to byte-based
  mirror: Switch MirrorBlockJob to byte-based
  mirror: Switch mirror_do_zero_or_discard() to byte-based
  mirror: Switch mirror_cow_align() to byte-based
  mirror: Switch mirror_do_read() to byte-based
  mirror: Switch mirror_iteration() to byte-based
  backup: Switch BackupBlockJob to byte-based
  backup: Switch block_backup.h to byte-based
  backup: Switch backup_do_cow() to byte-based
  backup: Switch backup_run() to byte-based
  block: Make bdrv_is_allocated() byte-based
  block: Make bdrv_is_allocated_above() byte-based

 include/block/block.h        |   6 +-
 include/block/block_backup.h |  11 +-
 include/qemu/ratelimit.h     |   3 +-
 block/backup.c               | 126 ++++++++----------
 block/commit.c               |  54 ++++----
 block/io.c                   |  59 +++++----
 block/mirror.c               | 300 ++++++++++++++++++++++---------------------
 block/replication.c          |  29 +++--
 block/stream.c               |  35 +++--
 block/vvfat.c                |  34 +++--
 migration/block.c            |   9 +-
 qemu-img.c                   |  15 ++-
 qemu-io-cmds.c               |  57 ++++----
 block/trace-events           |  14 +-
 14 files changed, 381 insertions(+), 371 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 01/17] blockjob: Track job ratelimits via bytes, not sectors
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-17 19:18   ` [Qemu-devel] [Qemu-block] " John Snow
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 02/17] trace: Show blockjob actions " Eric Blake
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

The user interface specifies job rate limits in bytes/second.
It's pointless to have our internal representation track things
in sectors/second, particularly since we want to move away from
sector-based interfaces.

Fix up a doc typo found while verifying that the ratelimit
code handles the scaling difference.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/qemu/ratelimit.h |  3 ++-
 block/backup.c           |  5 +++--
 block/commit.c           |  5 +++--
 block/mirror.c           | 13 +++++++------
 block/stream.c           |  5 +++--
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
index 8da1232..8dece48 100644
--- a/include/qemu/ratelimit.h
+++ b/include/qemu/ratelimit.h
@@ -24,7 +24,8 @@ typedef struct {

 /** Calculate and return delay for next request in ns
  *
- * Record that we sent @p n data units. If we may send more data units
+ * Record that we sent @n data units (where @n matches the scale chosen
+ * during ratelimit_set_speed). If we may send more data units
  * in the current time slice, return 0 (i.e. no delay). Otherwise
  * return the amount of time (in ns) until the start of the next time
  * slice that will permit sending the next chunk of data.
diff --git a/block/backup.c b/block/backup.c
index a4fb288..bcfa321 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -208,7 +208,7 @@ static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
         error_setg(errp, QERR_INVALID_PARAMETER, "speed");
         return;
     }
-    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
 }

 static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
@@ -359,7 +359,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
      */
     if (job->common.speed) {
         uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
-                                                      job->sectors_read);
+                                                      job->sectors_read *
+                                                      BDRV_SECTOR_SIZE);
         job->sectors_read = 0;
         block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
     } else {
diff --git a/block/commit.c b/block/commit.c
index 76a0d98..8b684e0 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -195,7 +195,8 @@ static void coroutine_fn commit_run(void *opaque)
         s->common.offset += n * BDRV_SECTOR_SIZE;

         if (copy && s->common.speed) {
-            delay_ns = ratelimit_calculate_delay(&s->limit, n);
+            delay_ns = ratelimit_calculate_delay(&s->limit,
+                                                 n * BDRV_SECTOR_SIZE);
         }
     }

@@ -217,7 +218,7 @@ static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp)
         error_setg(errp, QERR_INVALID_PARAMETER, "speed");
         return;
     }
-    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
 }

 static const BlockJobDriver commit_job_driver = {
diff --git a/block/mirror.c b/block/mirror.c
index 2173a2f..a7d0960 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -391,7 +391,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
     while (nb_chunks > 0 && sector_num < end) {
         int64_t ret;
-        int io_sectors, io_sectors_acct;
+        int io_sectors;
+        int64_t io_bytes_acct;
         BlockDriverState *file;
         enum MirrorMethod {
             MIRROR_METHOD_COPY,
@@ -439,16 +440,16 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         switch (mirror_method) {
         case MIRROR_METHOD_COPY:
             io_sectors = mirror_do_read(s, sector_num, io_sectors);
-            io_sectors_acct = io_sectors;
+            io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
             break;
         case MIRROR_METHOD_ZERO:
         case MIRROR_METHOD_DISCARD:
             mirror_do_zero_or_discard(s, sector_num, io_sectors,
                                       mirror_method == MIRROR_METHOD_DISCARD);
             if (write_zeroes_ok) {
-                io_sectors_acct = 0;
+                io_bytes_acct = 0;
             } else {
-                io_sectors_acct = io_sectors;
+                io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
             }
             break;
         default:
@@ -458,7 +459,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         sector_num += io_sectors;
         nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk);
         if (s->common.speed) {
-            delay_ns = ratelimit_calculate_delay(&s->limit, io_sectors_acct);
+            delay_ns = ratelimit_calculate_delay(&s->limit, io_bytes_acct);
         }
     }
     return delay_ns;
@@ -918,7 +919,7 @@ static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
         error_setg(errp, QERR_INVALID_PARAMETER, "speed");
         return;
     }
-    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
 }

 static void mirror_complete(BlockJob *job, Error **errp)
diff --git a/block/stream.c b/block/stream.c
index 0113710..11c5cf1 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -191,7 +191,8 @@ static void coroutine_fn stream_run(void *opaque)
         /* Publish progress */
         s->common.offset += n * BDRV_SECTOR_SIZE;
         if (copy && s->common.speed) {
-            delay_ns = ratelimit_calculate_delay(&s->limit, n);
+            delay_ns = ratelimit_calculate_delay(&s->limit,
+                                                 n * BDRV_SECTOR_SIZE);
         }
     }

@@ -220,7 +221,7 @@ static void stream_set_speed(BlockJob *job, int64_t speed, Error **errp)
         error_setg(errp, QERR_INVALID_PARAMETER, "speed");
         return;
     }
-    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
 }

 static const BlockJobDriver stream_job_driver = {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 02/17] trace: Show blockjob actions via bytes, not sectors
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 01/17] blockjob: Track job ratelimits via bytes, not sectors Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-17 19:18   ` [Qemu-devel] [Qemu-block] " John Snow
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 03/17] stream: Switch stream_populate() to byte-based Eric Blake
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

Upcoming patches are going to switch to byte-based interfaces
instead of sector-based.  Even worse, trace_backup_do_cow_enter()
had a weird mix of cluster and sector indices.  Make the tracing
uniformly use bytes.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/backup.c     | 16 ++++++++++------
 block/commit.c     |  3 ++-
 block/mirror.c     | 26 +++++++++++++++++---------
 block/stream.c     |  3 ++-
 block/trace-events | 14 +++++++-------
 5 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index bcfa321..b28df8c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -102,6 +102,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     void *bounce_buffer = NULL;
     int ret = 0;
     int64_t sectors_per_cluster = cluster_size_sectors(job);
+    int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
     int64_t start, end;
     int n;

@@ -110,18 +111,20 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     start = sector_num / sectors_per_cluster;
     end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);

-    trace_backup_do_cow_enter(job, start, sector_num, nb_sectors);
+    trace_backup_do_cow_enter(job, start * bytes_per_cluster,
+                              sector_num * BDRV_SECTOR_SIZE,
+                              nb_sectors * BDRV_SECTOR_SIZE);

     wait_for_overlapping_requests(job, start, end);
     cow_request_begin(&cow_request, job, start, end);

     for (; start < end; start++) {
         if (test_bit(start, job->done_bitmap)) {
-            trace_backup_do_cow_skip(job, start);
+            trace_backup_do_cow_skip(job, start * bytes_per_cluster);
             continue; /* already copied */
         }

-        trace_backup_do_cow_process(job, start);
+        trace_backup_do_cow_process(job, start * bytes_per_cluster);

         n = MIN(sectors_per_cluster,
                 job->common.len / BDRV_SECTOR_SIZE -
@@ -138,7 +141,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                             bounce_qiov.size, &bounce_qiov,
                             is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
         if (ret < 0) {
-            trace_backup_do_cow_read_fail(job, start, ret);
+            trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, ret);
             if (error_is_read) {
                 *error_is_read = true;
             }
@@ -154,7 +157,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                                  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
         }
         if (ret < 0) {
-            trace_backup_do_cow_write_fail(job, start, ret);
+            trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, ret);
             if (error_is_read) {
                 *error_is_read = false;
             }
@@ -177,7 +180,8 @@ out:

     cow_request_end(&cow_request);

-    trace_backup_do_cow_return(job, sector_num, nb_sectors, ret);
+    trace_backup_do_cow_return(job, sector_num * BDRV_SECTOR_SIZE,
+                               nb_sectors * BDRV_SECTOR_SIZE, ret);

     qemu_co_rwlock_unlock(&job->flush_rwlock);

diff --git a/block/commit.c b/block/commit.c
index 8b684e0..11dab98 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -176,7 +176,8 @@ static void coroutine_fn commit_run(void *opaque)
                                       COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
                                       &n);
         copy = (ret == 1);
-        trace_commit_one_iteration(s, sector_num, n, ret);
+        trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
+                                   n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
             ret = commit_populate(s->top, s->base, sector_num, n, buf);
             bytes_written += n * BDRV_SECTOR_SIZE;
diff --git a/block/mirror.c b/block/mirror.c
index a7d0960..d83d482 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -103,7 +103,8 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
     int64_t chunk_num;
     int i, nb_chunks, sectors_per_chunk;

-    trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors, ret);
+    trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
+                                op->nb_sectors * BDRV_SECTOR_SIZE, ret);

     s->in_flight--;
     s->sectors_in_flight -= op->nb_sectors;
@@ -268,7 +269,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
     nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);

     while (s->buf_free_count < nb_chunks) {
-        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+        trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
+                                     s->in_flight);
         mirror_wait_for_io(s);
     }

@@ -294,7 +296,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
     /* Copy the dirty cluster.  */
     s->in_flight++;
     s->sectors_in_flight += nb_sectors;
-    trace_mirror_one_iteration(s, sector_num, nb_sectors);
+    trace_mirror_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
+                               nb_sectors * BDRV_SECTOR_SIZE);

     blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, &op->qiov, 0,
                    mirror_read_complete, op);
@@ -346,13 +349,15 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     if (sector_num < 0) {
         bdrv_set_dirty_iter(s->dbi, 0);
         sector_num = bdrv_dirty_iter_next(s->dbi);
-        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
+        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
+                                  BDRV_SECTOR_SIZE);
         assert(sector_num >= 0);
     }

     first_chunk = sector_num / sectors_per_chunk;
     while (test_bit(first_chunk, s->in_flight_bitmap)) {
-        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+        trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
+                                     s->in_flight);
         mirror_wait_for_io(s);
     }

@@ -428,7 +433,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         }

         while (s->in_flight >= MAX_IN_FLIGHT) {
-            trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+            trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
+                                         s->in_flight);
             mirror_wait_for_io(s);
         }

@@ -811,7 +817,8 @@ static void coroutine_fn mirror_run(void *opaque)
             s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
             if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
                 (cnt == 0 && s->in_flight > 0)) {
-                trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
+                trace_mirror_yield(s, cnt * BDRV_SECTOR_SIZE,
+                                   s->buf_free_count, s->in_flight);
                 mirror_wait_for_io(s);
                 continue;
             } else if (cnt != 0) {
@@ -852,7 +859,7 @@ static void coroutine_fn mirror_run(void *opaque)
              * whether to switch to target check one last time if I/O has
              * come in the meanwhile, and if not flush the data to disk.
              */
-            trace_mirror_before_drain(s, cnt);
+            trace_mirror_before_drain(s, cnt * BDRV_SECTOR_SIZE);

             bdrv_drained_begin(bs);
             cnt = bdrv_get_dirty_count(s->dirty_bitmap);
@@ -871,7 +878,8 @@ static void coroutine_fn mirror_run(void *opaque)
         }

         ret = 0;
-        trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
+        trace_mirror_before_sleep(s, cnt * BDRV_SECTOR_SIZE,
+                                  s->synced, delay_ns);
         if (!s->synced) {
             block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
             if (block_job_is_cancelled(&s->common)) {
diff --git a/block/stream.c b/block/stream.c
index 11c5cf1..3134ec5 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -168,7 +168,8 @@ static void coroutine_fn stream_run(void *opaque)

             copy = (ret == 1);
         }
-        trace_stream_one_iteration(s, sector_num, n, ret);
+        trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
+                                   n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
             ret = stream_populate(blk, sector_num, n, buf);
         }
diff --git a/block/trace-events b/block/trace-events
index 0bc5c0a..04f6463 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -18,11 +18,11 @@ bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p off
 bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %u"

 # block/stream.c
-stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
+stream_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
 stream_start(void *bs, void *base, void *s) "bs %p base %p s %p"

 # block/commit.c
-commit_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
+commit_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
 commit_start(void *bs, void *base, void *top, void *s) "bs %p base %p top %p s %p"

 # block/mirror.c
@@ -31,14 +31,14 @@ mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_flush(void *s) "s %p"
 mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
 mirror_before_sleep(void *s, int64_t cnt, int synced, uint64_t delay_ns) "s %p dirty count %"PRId64" synced %d delay %"PRIu64"ns"
-mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
-mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors, int ret) "s %p sector_num %"PRId64" nb_sectors %d ret %d"
+mirror_one_iteration(void *s, int64_t offset, uint64_t bytes) "s %p offset %" PRId64 " bytes %" PRIu64
+mirror_iteration_done(void *s, int64_t offset, uint64_t bytes, int ret) "s %p offset %" PRId64 " bytes %" PRIu64 " ret %d"
 mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d"
-mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
+mirror_yield_in_flight(void *s, int64_t offset, int in_flight) "s %p offset %" PRId64 " in_flight %d"

 # block/backup.c
-backup_do_cow_enter(void *job, int64_t start, int64_t sector_num, int nb_sectors) "job %p start %"PRId64" sector_num %"PRId64" nb_sectors %d"
-backup_do_cow_return(void *job, int64_t sector_num, int nb_sectors, int ret) "job %p sector_num %"PRId64" nb_sectors %d ret %d"
+backup_do_cow_enter(void *job, int64_t start, int64_t offset, uint64_t bytes) "job %p start %" PRId64 " offset %" PRId64 " bytes %" PRIu64
+backup_do_cow_return(void *job, int64_t offset, uint64_t bytes, int ret) "job %p offset %" PRId64 " bytes %" PRIu64 " ret %d"
 backup_do_cow_skip(void *job, int64_t start) "job %p start %"PRId64
 backup_do_cow_process(void *job, int64_t start) "job %p start %"PRId64
 backup_do_cow_read_fail(void *job, int64_t start, int ret) "job %p start %"PRId64" ret %d"
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 03/17] stream: Switch stream_populate() to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 01/17] blockjob: Track job ratelimits via bytes, not sectors Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 02/17] trace: Show blockjob actions " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 04/17] stream: Switch stream_run() " Eric Blake
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Start by converting an
internal function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/stream.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 3134ec5..5acef6d 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -41,20 +41,20 @@ typedef struct StreamBlockJob {
 } StreamBlockJob;

 static int coroutine_fn stream_populate(BlockBackend *blk,
-                                        int64_t sector_num, int nb_sectors,
+                                        int64_t offset, uint64_t bytes,
                                         void *buf)
 {
     struct iovec iov = {
         .iov_base = buf,
-        .iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
+        .iov_len  = bytes,
     };
     QEMUIOVector qiov;

+    assert(bytes < SIZE_MAX);
     qemu_iovec_init_external(&qiov, &iov, 1);

     /* Copy-on-read the unallocated clusters */
-    return blk_co_preadv(blk, sector_num * BDRV_SECTOR_SIZE, qiov.size, &qiov,
-                         BDRV_REQ_COPY_ON_READ);
+    return blk_co_preadv(blk, offset, qiov.size, &qiov, BDRV_REQ_COPY_ON_READ);
 }

 typedef struct {
@@ -171,7 +171,8 @@ static void coroutine_fn stream_run(void *opaque)
         trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
                                    n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
-            ret = stream_populate(blk, sector_num, n, buf);
+            ret = stream_populate(blk, sector_num * BDRV_SECTOR_SIZE,
+                                  n * BDRV_SECTOR_SIZE, buf);
         }
         if (ret < 0) {
             BlockErrorAction action =
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 04/17] stream: Switch stream_run() to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (2 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 03/17] stream: Switch stream_populate() to byte-based Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 05/17] commit: Switch commit_populate() " Eric Blake
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of streaming to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are sector-aligned).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/stream.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 5acef6d..3ed4fef 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -108,12 +108,11 @@ static void coroutine_fn stream_run(void *opaque)
     BlockBackend *blk = s->common.blk;
     BlockDriverState *bs = blk_bs(blk);
     BlockDriverState *base = s->base;
-    int64_t sector_num = 0;
-    int64_t end = -1;
+    int64_t offset = 0;
     uint64_t delay_ns = 0;
     int error = 0;
     int ret = 0;
-    int n = 0;
+    int n = 0; /* sectors */
     void *buf;

     if (!bs->backing) {
@@ -126,7 +125,6 @@ static void coroutine_fn stream_run(void *opaque)
         goto out;
     }

-    end = s->common.len >> BDRV_SECTOR_BITS;
     buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE);

     /* Turn on copy-on-read for the whole block device so that guest read
@@ -138,7 +136,7 @@ static void coroutine_fn stream_run(void *opaque)
         bdrv_enable_copy_on_read(bs);
     }

-    for (sector_num = 0; sector_num < end; sector_num += n) {
+    for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
         bool copy;

         /* Note that even when no rate limit is applied we need to yield
@@ -151,28 +149,26 @@ static void coroutine_fn stream_run(void *opaque)

         copy = false;

-        ret = bdrv_is_allocated(bs, sector_num,
+        ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
                                 STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
         if (ret == 1) {
             /* Allocated in the top, no need to copy.  */
         } else if (ret >= 0) {
             /* Copy if allocated in the intermediate images.  Limit to the
-             * known-unallocated area [sector_num, sector_num+n).  */
+             * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
             ret = bdrv_is_allocated_above(backing_bs(bs), base,
-                                          sector_num, n, &n);
+                                          offset / BDRV_SECTOR_SIZE, n, &n);

             /* Finish early if end of backing file has been reached */
             if (ret == 0 && n == 0) {
-                n = end - sector_num;
+                n = (s->common.len - offset) / BDRV_SECTOR_SIZE;
             }

             copy = (ret == 1);
         }
-        trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
-                                   n * BDRV_SECTOR_SIZE, ret);
+        trace_stream_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
-            ret = stream_populate(blk, sector_num * BDRV_SECTOR_SIZE,
-                                  n * BDRV_SECTOR_SIZE, buf);
+            ret = stream_populate(blk, offset, n * BDRV_SECTOR_SIZE, buf);
         }
         if (ret < 0) {
             BlockErrorAction action =
@@ -211,7 +207,7 @@ out:
     /* Modify backing chain and close BDSes in main loop */
     data = g_malloc(sizeof(*data));
     data->ret = ret;
-    data->reached_end = sector_num == end;
+    data->reached_end = offset == s->common.len;
     block_job_defer_to_main_loop(&s->common, stream_complete, data);
 }

-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 05/17] commit: Switch commit_populate() to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (3 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 04/17] stream: Switch stream_run() " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 06/17] commit: Switch commit_run() " Eric Blake
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Start by converting an
internal function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/commit.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 11dab98..d244c62 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -47,26 +47,25 @@ typedef struct CommitBlockJob {
 } CommitBlockJob;

 static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
-                                        int64_t sector_num, int nb_sectors,
+                                        int64_t offset, uint64_t bytes,
                                         void *buf)
 {
     int ret = 0;
     QEMUIOVector qiov;
     struct iovec iov = {
         .iov_base = buf,
-        .iov_len = nb_sectors * BDRV_SECTOR_SIZE,
+        .iov_len = bytes,
     };

+    assert(bytes < SIZE_MAX);
     qemu_iovec_init_external(&qiov, &iov, 1);

-    ret = blk_co_preadv(bs, sector_num * BDRV_SECTOR_SIZE,
-                        qiov.size, &qiov, 0);
+    ret = blk_co_preadv(bs, offset, qiov.size, &qiov, 0);
     if (ret < 0) {
         return ret;
     }

-    ret = blk_co_pwritev(base, sector_num * BDRV_SECTOR_SIZE,
-                         qiov.size, &qiov, 0);
+    ret = blk_co_pwritev(base, offset, qiov.size, &qiov, 0);
     if (ret < 0) {
         return ret;
     }
@@ -179,7 +178,9 @@ static void coroutine_fn commit_run(void *opaque)
         trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
                                    n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
-            ret = commit_populate(s->top, s->base, sector_num, n, buf);
+            ret = commit_populate(s->top, s->base,
+                                  sector_num * BDRV_SECTOR_SIZE,
+                                  n * BDRV_SECTOR_SIZE, buf);
             bytes_written += n * BDRV_SECTOR_SIZE;
         }
         if (ret < 0) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 06/17] commit: Switch commit_run() to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (4 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 05/17] commit: Switch commit_populate() " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 07/17] mirror: Switch MirrorBlockJob " Eric Blake
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of committing to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are sector-aligned).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/commit.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index d244c62..b7d3e60 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -129,17 +129,16 @@ static void coroutine_fn commit_run(void *opaque)
 {
     CommitBlockJob *s = opaque;
     CommitCompleteData *data;
-    int64_t sector_num, end;
+    int64_t offset;
     uint64_t delay_ns = 0;
     int ret = 0;
-    int n = 0;
+    int n = 0; /* sectors */
     void *buf = NULL;
     int bytes_written = 0;
     int64_t base_len;

     ret = s->common.len = blk_getlength(s->top);

-
     if (s->common.len < 0) {
         goto out;
     }
@@ -156,10 +155,9 @@ static void coroutine_fn commit_run(void *opaque)
         }
     }

-    end = s->common.len >> BDRV_SECTOR_BITS;
     buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);

-    for (sector_num = 0; sector_num < end; sector_num += n) {
+    for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
         bool copy;

         /* Note that even when no rate limit is applied we need to yield
@@ -171,15 +169,13 @@ static void coroutine_fn commit_run(void *opaque)
         }
         /* Copy if allocated above the base */
         ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
-                                      sector_num,
+                                      offset / BDRV_SECTOR_SIZE,
                                       COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
                                       &n);
         copy = (ret == 1);
-        trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
-                                   n * BDRV_SECTOR_SIZE, ret);
+        trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
         if (copy) {
-            ret = commit_populate(s->top, s->base,
-                                  sector_num * BDRV_SECTOR_SIZE,
+            ret = commit_populate(s->top, s->base, offset,
                                   n * BDRV_SECTOR_SIZE, buf);
             bytes_written += n * BDRV_SECTOR_SIZE;
         }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 07/17] mirror: Switch MirrorBlockJob to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (5 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 06/17] commit: Switch commit_run() " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 08/17] mirror: Switch mirror_do_zero_or_discard() " Eric Blake
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting an
internal structure (no semantic change), and all references to the
buffer size.

[checkpatch has a false positive on use of MIN() in this patch]

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/mirror.c | 79 ++++++++++++++++++++++++++++------------------------------
 1 file changed, 38 insertions(+), 41 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index d83d482..c923520 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -24,9 +24,8 @@

 #define SLICE_TIME    100000000ULL /* ns */
 #define MAX_IN_FLIGHT 16
-#define MAX_IO_SECTORS ((1 << 20) >> BDRV_SECTOR_BITS) /* 1 Mb */
-#define DEFAULT_MIRROR_BUF_SIZE \
-    (MAX_IN_FLIGHT * MAX_IO_SECTORS * BDRV_SECTOR_SIZE)
+#define MAX_IO_BYTES (1 << 20) /* 1 Mb */
+#define DEFAULT_MIRROR_BUF_SIZE (MAX_IN_FLIGHT * MAX_IO_BYTES)

 /* The mirroring buffer is a list of granularity-sized chunks.
  * Free chunks are organized in a list.
@@ -67,11 +66,11 @@ typedef struct MirrorBlockJob {
     uint64_t last_pause_ns;
     unsigned long *in_flight_bitmap;
     int in_flight;
-    int64_t sectors_in_flight;
+    int64_t bytes_in_flight;
     int ret;
     bool unmap;
     bool waiting_for_io;
-    int target_cluster_sectors;
+    int target_cluster_size;
     int max_iov;
     bool initial_zeroing_ongoing;
 } MirrorBlockJob;
@@ -79,8 +78,8 @@ typedef struct MirrorBlockJob {
 typedef struct MirrorOp {
     MirrorBlockJob *s;
     QEMUIOVector qiov;
-    int64_t sector_num;
-    int nb_sectors;
+    int64_t offset;
+    uint64_t bytes;
 } MirrorOp;

 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
@@ -101,13 +100,12 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
     MirrorBlockJob *s = op->s;
     struct iovec *iov;
     int64_t chunk_num;
-    int i, nb_chunks, sectors_per_chunk;
+    int i, nb_chunks;

-    trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
-                                op->nb_sectors * BDRV_SECTOR_SIZE, ret);
+    trace_mirror_iteration_done(s, op->offset, op->bytes, ret);

     s->in_flight--;
-    s->sectors_in_flight -= op->nb_sectors;
+    s->bytes_in_flight -= op->bytes;
     iov = op->qiov.iov;
     for (i = 0; i < op->qiov.niov; i++) {
         MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
@@ -115,16 +113,15 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
         s->buf_free_count++;
     }

-    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-    chunk_num = op->sector_num / sectors_per_chunk;
-    nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
+    chunk_num = op->offset / s->granularity;
+    nb_chunks = DIV_ROUND_UP(op->bytes, s->granularity);
     bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
     if (ret >= 0) {
         if (s->cow_bitmap) {
             bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
         }
         if (!s->initial_zeroing_ongoing) {
-            s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
+            s->common.offset += op->bytes;
         }
     }
     qemu_iovec_destroy(&op->qiov);
@@ -144,7 +141,8 @@ static void mirror_write_complete(void *opaque, int ret)
     if (ret < 0) {
         BlockErrorAction action;

-        bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
+        bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
+                              op->bytes >> BDRV_SECTOR_BITS);
         action = mirror_error_action(s, false, -ret);
         if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
             s->ret = ret;
@@ -163,7 +161,8 @@ static void mirror_read_complete(void *opaque, int ret)
     if (ret < 0) {
         BlockErrorAction action;

-        bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
+        bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
+                              op->bytes >> BDRV_SECTOR_BITS);
         action = mirror_error_action(s, true, -ret);
         if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
             s->ret = ret;
@@ -171,7 +170,7 @@ static void mirror_read_complete(void *opaque, int ret)

         mirror_iteration_done(op, ret);
     } else {
-        blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, &op->qiov,
+        blk_aio_pwritev(s->target, op->offset, &op->qiov,
                         0, mirror_write_complete, op);
     }
     aio_context_release(blk_get_aio_context(s->common.blk));
@@ -211,7 +210,8 @@ static int mirror_cow_align(MirrorBlockJob *s,
         align_nb_sectors = max_sectors;
         if (need_cow) {
             align_nb_sectors = QEMU_ALIGN_DOWN(align_nb_sectors,
-                                               s->target_cluster_sectors);
+                                               s->target_cluster_size >>
+                                               BDRV_SECTOR_BITS);
         }
     }
     /* Clipping may result in align_nb_sectors unaligned to chunk boundary, but
@@ -277,8 +277,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
     /* Allocate a MirrorOp that is used as an AIO callback.  */
     op = g_new(MirrorOp, 1);
     op->s = s;
-    op->sector_num = sector_num;
-    op->nb_sectors = nb_sectors;
+    op->offset = sector_num * BDRV_SECTOR_SIZE;
+    op->bytes = nb_sectors * BDRV_SECTOR_SIZE;

     /* Now make a QEMUIOVector taking enough granularity-sized chunks
      * from s->buf_free.
@@ -295,7 +295,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,

     /* Copy the dirty cluster.  */
     s->in_flight++;
-    s->sectors_in_flight += nb_sectors;
+    s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
     trace_mirror_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
                                nb_sectors * BDRV_SECTOR_SIZE);

@@ -315,19 +315,17 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
      * so the freeing in mirror_iteration_done is nop. */
     op = g_new0(MirrorOp, 1);
     op->s = s;
-    op->sector_num = sector_num;
-    op->nb_sectors = nb_sectors;
+    op->offset = sector_num * BDRV_SECTOR_SIZE;
+    op->bytes = nb_sectors * BDRV_SECTOR_SIZE;

     s->in_flight++;
-    s->sectors_in_flight += nb_sectors;
+    s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
     if (is_discard) {
         blk_aio_pdiscard(s->target, sector_num << BDRV_SECTOR_BITS,
-                         op->nb_sectors << BDRV_SECTOR_BITS,
-                         mirror_write_complete, op);
+                         op->bytes, mirror_write_complete, op);
     } else {
         blk_aio_pwrite_zeroes(s->target, sector_num * BDRV_SECTOR_SIZE,
-                              op->nb_sectors * BDRV_SECTOR_SIZE,
-                              s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
+                              op->bytes, s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
                               mirror_write_complete, op);
     }
 }
@@ -342,8 +340,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     int64_t end = s->bdev_length / BDRV_SECTOR_SIZE;
     int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
     bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
-    int max_io_sectors = MAX((s->buf_size >> BDRV_SECTOR_BITS) / MAX_IN_FLIGHT,
-                             MAX_IO_SECTORS);
+    int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);

     sector_num = bdrv_dirty_iter_next(s->dbi);
     if (sector_num < 0) {
@@ -410,9 +407,10 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
                                           nb_chunks * sectors_per_chunk,
                                           &io_sectors, &file);
         if (ret < 0) {
-            io_sectors = MIN(nb_chunks * sectors_per_chunk, max_io_sectors);
+            io_sectors = MIN(nb_chunks * sectors_per_chunk,
+                             max_io_bytes >> BDRV_SECTOR_BITS);
         } else if (ret & BDRV_BLOCK_DATA) {
-            io_sectors = MIN(io_sectors, max_io_sectors);
+            io_sectors = MIN(io_sectors, max_io_bytes >> BDRV_SECTOR_BITS);
         }

         io_sectors -= io_sectors % sectors_per_chunk;
@@ -707,7 +705,6 @@ static void coroutine_fn mirror_run(void *opaque)
     char backing_filename[2]; /* we only need 2 characters because we are only
                                  checking for a NULL string */
     int ret = 0;
-    int target_cluster_size = BDRV_SECTOR_SIZE;

     if (block_job_is_cancelled(&s->common)) {
         goto immediate_exit;
@@ -756,17 +753,17 @@ static void coroutine_fn mirror_run(void *opaque)
      * the destination do COW.  Instead, we copy sectors around the
      * dirty data if needed.  We need a bitmap to do that.
      */
+    s->target_cluster_size = BDRV_SECTOR_SIZE;
     bdrv_get_backing_filename(target_bs, backing_filename,
                               sizeof(backing_filename));
     if (!bdrv_get_info(target_bs, &bdi) && bdi.cluster_size) {
-        target_cluster_size = bdi.cluster_size;
+        s->target_cluster_size = bdi.cluster_size;
     }
     if (backing_filename[0] && !target_bs->backing
-        && s->granularity < target_cluster_size) {
-        s->buf_size = MAX(s->buf_size, target_cluster_size);
+        && s->granularity < s->target_cluster_size) {
+        s->buf_size = MAX(s->buf_size, s->target_cluster_size);
         s->cow_bitmap = bitmap_new(length);
     }
-    s->target_cluster_sectors = target_cluster_size >> BDRV_SECTOR_BITS;
     s->max_iov = MIN(bs->bl.max_iov, target_bs->bl.max_iov);

     s->buf = qemu_try_blockalign(bs, s->buf_size);
@@ -802,10 +799,10 @@ static void coroutine_fn mirror_run(void *opaque)
         cnt = bdrv_get_dirty_count(s->dirty_bitmap);
         /* s->common.offset contains the number of bytes already processed so
          * far, cnt is the number of dirty sectors remaining and
-         * s->sectors_in_flight is the number of sectors currently being
+         * s->bytes_in_flight is the number of bytes currently being
          * processed; together those are the current total operation length */
-        s->common.len = s->common.offset +
-                        (cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE;
+        s->common.len = s->common.offset + s->bytes_in_flight +
+            cnt * BDRV_SECTOR_SIZE;

         /* Note that even when no rate limit is applied we need to yield
          * periodically with no pending I/O so that bdrv_drain_all() returns.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 08/17] mirror: Switch mirror_do_zero_or_discard() to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (6 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 07/17] mirror: Switch MirrorBlockJob " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 09/17] mirror: Switch mirror_cow_align() " Eric Blake
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/mirror.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index c923520..9a6f474 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -305,8 +305,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
 }

 static void mirror_do_zero_or_discard(MirrorBlockJob *s,
-                                      int64_t sector_num,
-                                      int nb_sectors,
+                                      int64_t offset,
+                                      uint64_t bytes,
                                       bool is_discard)
 {
     MirrorOp *op;
@@ -315,16 +315,16 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
      * so the freeing in mirror_iteration_done is nop. */
     op = g_new0(MirrorOp, 1);
     op->s = s;
-    op->offset = sector_num * BDRV_SECTOR_SIZE;
-    op->bytes = nb_sectors * BDRV_SECTOR_SIZE;
+    op->offset = offset;
+    op->bytes = bytes;

     s->in_flight++;
-    s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
+    s->bytes_in_flight += bytes;
     if (is_discard) {
-        blk_aio_pdiscard(s->target, sector_num << BDRV_SECTOR_BITS,
+        blk_aio_pdiscard(s->target, offset,
                          op->bytes, mirror_write_complete, op);
     } else {
-        blk_aio_pwrite_zeroes(s->target, sector_num * BDRV_SECTOR_SIZE,
+        blk_aio_pwrite_zeroes(s->target, offset,
                               op->bytes, s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
                               mirror_write_complete, op);
     }
@@ -448,7 +448,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             break;
         case MIRROR_METHOD_ZERO:
         case MIRROR_METHOD_DISCARD:
-            mirror_do_zero_or_discard(s, sector_num, io_sectors,
+            mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
+                                      io_sectors * BDRV_SECTOR_SIZE,
                                       mirror_method == MIRROR_METHOD_DISCARD);
             if (write_zeroes_ok) {
                 io_bytes_acct = 0;
@@ -645,7 +646,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
                 continue;
             }

-            mirror_do_zero_or_discard(s, sector_num, nb_sectors, false);
+            mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
+                                      nb_sectors * BDRV_SECTOR_SIZE, false);
             sector_num += nb_sectors;
         }

-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 09/17] mirror: Switch mirror_cow_align() to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (7 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 08/17] mirror: Switch mirror_do_zero_or_discard() " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 10/17] mirror: Switch mirror_do_read() " Eric Blake
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and add mirror_clip_bytes() as a
counterpart to mirror_clip_sectors().  Some of the conversion is
a bit tricky, requiring temporaries to convert between units; it
will be cleared up in a following patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/mirror.c | 63 +++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 38 insertions(+), 25 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9a6f474..84b8cca 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -176,6 +176,15 @@ static void mirror_read_complete(void *opaque, int ret)
     aio_context_release(blk_get_aio_context(s->common.blk));
 }

+/* Clip bytes relative to offset to not exceed end-of-file */
+static inline void mirror_clip_bytes(MirrorBlockJob *s,
+                                     int64_t offset,
+                                     unsigned int *bytes)
+{
+    *bytes = MIN(*bytes, s->bdev_length - offset);
+}
+
+/* Clip nb_sectors relative to sector_num to not exceed end-of-file */
 static inline void mirror_clip_sectors(MirrorBlockJob *s,
                                        int64_t sector_num,
                                        int *nb_sectors)
@@ -184,43 +193,39 @@ static inline void mirror_clip_sectors(MirrorBlockJob *s,
                       s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
 }

-/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
- * return the offset of the adjusted tail sector against original. */
-static int mirror_cow_align(MirrorBlockJob *s,
-                            int64_t *sector_num,
-                            int *nb_sectors)
+/* Round offset and/or bytes to target cluster if COW is needed, and
+ * return the offset of the adjusted tail against original. */
+static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
+                            unsigned int *bytes)
 {
     bool need_cow;
     int ret = 0;
-    int chunk_sectors = s->granularity >> BDRV_SECTOR_BITS;
-    int64_t align_sector_num = *sector_num;
-    int align_nb_sectors = *nb_sectors;
-    int max_sectors = chunk_sectors * s->max_iov;
+    int64_t align_offset = *offset;
+    unsigned int align_bytes = *bytes;
+    int max_bytes = s->granularity * s->max_iov;

-    need_cow = !test_bit(*sector_num / chunk_sectors, s->cow_bitmap);
-    need_cow |= !test_bit((*sector_num + *nb_sectors - 1) / chunk_sectors,
+    need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
+    need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
                           s->cow_bitmap);
     if (need_cow) {
-        bdrv_round_sectors_to_clusters(blk_bs(s->target), *sector_num,
-                                       *nb_sectors, &align_sector_num,
-                                       &align_nb_sectors);
+        bdrv_round_to_clusters(blk_bs(s->target), *offset, *bytes,
+                               &align_offset, &align_bytes);
     }

-    if (align_nb_sectors > max_sectors) {
-        align_nb_sectors = max_sectors;
+    if (align_bytes > max_bytes) {
+        align_bytes = max_bytes;
         if (need_cow) {
-            align_nb_sectors = QEMU_ALIGN_DOWN(align_nb_sectors,
-                                               s->target_cluster_size >>
-                                               BDRV_SECTOR_BITS);
+            align_bytes = QEMU_ALIGN_DOWN(align_bytes,
+                                          s->target_cluster_size);
         }
     }
-    /* Clipping may result in align_nb_sectors unaligned to chunk boundary, but
+    /* Clipping may result in align_bytes unaligned to chunk boundary, but
      * that doesn't matter because it's already the end of source image. */
-    mirror_clip_sectors(s, align_sector_num, &align_nb_sectors);
+    mirror_clip_bytes(s, align_offset, &align_bytes);

-    ret = align_sector_num + align_nb_sectors - (*sector_num + *nb_sectors);
-    *sector_num = align_sector_num;
-    *nb_sectors = align_nb_sectors;
+    ret = align_offset + align_bytes - (*offset + *bytes);
+    *offset = align_offset;
+    *bytes = align_bytes;
     assert(ret >= 0);
     return ret;
 }
@@ -256,10 +261,18 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
     nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
     nb_sectors = MIN(max_sectors, nb_sectors);
     assert(nb_sectors);
+    assert(nb_sectors < BDRV_REQUEST_MAX_SECTORS);
     ret = nb_sectors;

     if (s->cow_bitmap) {
-        ret += mirror_cow_align(s, &sector_num, &nb_sectors);
+        int64_t offset = sector_num * BDRV_SECTOR_SIZE;
+        unsigned int bytes = nb_sectors * BDRV_SECTOR_SIZE;
+        int gap;
+
+        gap = mirror_cow_align(s, &offset, &bytes);
+        sector_num = offset / BDRV_SECTOR_SIZE;
+        nb_sectors = bytes / BDRV_SECTOR_SIZE;
+        ret += gap / BDRV_SECTOR_SIZE;
     }
     assert(nb_sectors << BDRV_SECTOR_BITS <= s->buf_size);
     /* The sector range must meet granularity because:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 10/17] mirror: Switch mirror_do_read() to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (8 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 09/17] mirror: Switch mirror_cow_align() " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 11/17] mirror: Switch mirror_iteration() " Eric Blake
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/mirror.c | 75 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 33 insertions(+), 42 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 84b8cca..da23b70 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -196,7 +196,7 @@ static inline void mirror_clip_sectors(MirrorBlockJob *s,
 /* Round offset and/or bytes to target cluster if COW is needed, and
  * return the offset of the adjusted tail against original. */
 static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
-                            unsigned int *bytes)
+                            uint64_t *bytes)
 {
     bool need_cow;
     int ret = 0;
@@ -204,6 +204,7 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
     unsigned int align_bytes = *bytes;
     int max_bytes = s->granularity * s->max_iov;

+    assert(*bytes < INT_MAX);
     need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
     need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
                           s->cow_bitmap);
@@ -239,59 +240,50 @@ static inline void mirror_wait_for_io(MirrorBlockJob *s)
 }

 /* Submit async read while handling COW.
- * Returns: The number of sectors copied after and including sector_num,
- *          excluding any sectors copied prior to sector_num due to alignment.
- *          This will be nb_sectors if no alignment is necessary, or
- *          (new_end - sector_num) if tail is rounded up or down due to
+ * Returns: The number of bytes copied after and including offset,
+ *          excluding any bytes copied prior to offset due to alignment.
+ *          This will be @bytes if no alignment is necessary, or
+ *          (new_end - offset) if tail is rounded up or down due to
  *          alignment or buffer limit.
  */
-static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
-                          int nb_sectors)
+static uint64_t mirror_do_read(MirrorBlockJob *s, int64_t offset,
+                               uint64_t bytes)
 {
     BlockBackend *source = s->common.blk;
-    int sectors_per_chunk, nb_chunks;
-    int ret;
+    int nb_chunks;
+    uint64_t ret;
     MirrorOp *op;
-    int max_sectors;
+    uint64_t max_bytes;

-    sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-    max_sectors = sectors_per_chunk * s->max_iov;
+    max_bytes = s->granularity * s->max_iov;

     /* We can only handle as much as buf_size at a time. */
-    nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
-    nb_sectors = MIN(max_sectors, nb_sectors);
-    assert(nb_sectors);
-    assert(nb_sectors < BDRV_REQUEST_MAX_SECTORS);
-    ret = nb_sectors;
+    bytes = MIN(s->buf_size, MIN(max_bytes, bytes));
+    assert(bytes);
+    assert(bytes < BDRV_REQUEST_MAX_BYTES);
+    ret = bytes;

     if (s->cow_bitmap) {
-        int64_t offset = sector_num * BDRV_SECTOR_SIZE;
-        unsigned int bytes = nb_sectors * BDRV_SECTOR_SIZE;
-        int gap;
-
-        gap = mirror_cow_align(s, &offset, &bytes);
-        sector_num = offset / BDRV_SECTOR_SIZE;
-        nb_sectors = bytes / BDRV_SECTOR_SIZE;
-        ret += gap / BDRV_SECTOR_SIZE;
+        ret += mirror_cow_align(s, &offset, &bytes);
     }
-    assert(nb_sectors << BDRV_SECTOR_BITS <= s->buf_size);
-    /* The sector range must meet granularity because:
+    assert(bytes <= s->buf_size);
+    /* The range will be sector-aligned because:
      * 1) Caller passes in aligned values;
-     * 2) mirror_cow_align is used only when target cluster is larger. */
-    assert(!(sector_num % sectors_per_chunk));
-    nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
+     * 2) mirror_cow_align is used only when target cluster is larger.
+     * But it might not be cluster-aligned at end-of-file. */
+    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
+    nb_chunks = DIV_ROUND_UP(bytes, s->granularity);

     while (s->buf_free_count < nb_chunks) {
-        trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
-                                     s->in_flight);
+        trace_mirror_yield_in_flight(s, offset, s->in_flight);
         mirror_wait_for_io(s);
     }

     /* Allocate a MirrorOp that is used as an AIO callback.  */
     op = g_new(MirrorOp, 1);
     op->s = s;
-    op->offset = sector_num * BDRV_SECTOR_SIZE;
-    op->bytes = nb_sectors * BDRV_SECTOR_SIZE;
+    op->offset = offset;
+    op->bytes = bytes;

     /* Now make a QEMUIOVector taking enough granularity-sized chunks
      * from s->buf_free.
@@ -299,7 +291,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
     qemu_iovec_init(&op->qiov, nb_chunks);
     while (nb_chunks-- > 0) {
         MirrorBuffer *buf = QSIMPLEQ_FIRST(&s->buf_free);
-        size_t remaining = nb_sectors * BDRV_SECTOR_SIZE - op->qiov.size;
+        size_t remaining = bytes - op->qiov.size;

         QSIMPLEQ_REMOVE_HEAD(&s->buf_free, next);
         s->buf_free_count--;
@@ -308,12 +300,10 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,

     /* Copy the dirty cluster.  */
     s->in_flight++;
-    s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
-    trace_mirror_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
-                               nb_sectors * BDRV_SECTOR_SIZE);
+    s->bytes_in_flight += bytes;
+    trace_mirror_one_iteration(s, offset, bytes);

-    blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, &op->qiov, 0,
-                   mirror_read_complete, op);
+    blk_aio_preadv(source, offset, &op->qiov, 0, mirror_read_complete, op);
     return ret;
 }

@@ -456,8 +446,9 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         mirror_clip_sectors(s, sector_num, &io_sectors);
         switch (mirror_method) {
         case MIRROR_METHOD_COPY:
-            io_sectors = mirror_do_read(s, sector_num, io_sectors);
-            io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
+            io_bytes_acct = mirror_do_read(s, sector_num * BDRV_SECTOR_SIZE,
+                                           io_sectors * BDRV_SECTOR_SIZE);
+            io_sectors = io_bytes_acct / BDRV_SECTOR_SIZE;
             break;
         case MIRROR_METHOD_ZERO:
         case MIRROR_METHOD_DISCARD:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 11/17] mirror: Switch mirror_iteration() to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (9 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 10/17] mirror: Switch mirror_do_read() " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 12/17] backup: Switch BackupBlockJob " Eric Blake
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of mirroring to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are both sector-aligned and multiples of the granularity).  Drop
the now-unused mirror_clip_sectors().

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/mirror.c | 103 +++++++++++++++++++++++++--------------------------------
 1 file changed, 45 insertions(+), 58 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index da23b70..8de0492 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -184,15 +184,6 @@ static inline void mirror_clip_bytes(MirrorBlockJob *s,
     *bytes = MIN(*bytes, s->bdev_length - offset);
 }

-/* Clip nb_sectors relative to sector_num to not exceed end-of-file */
-static inline void mirror_clip_sectors(MirrorBlockJob *s,
-                                       int64_t sector_num,
-                                       int *nb_sectors)
-{
-    *nb_sectors = MIN(*nb_sectors,
-                      s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
-}
-
 /* Round offset and/or bytes to target cluster if COW is needed, and
  * return the offset of the adjusted tail against original. */
 static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
@@ -336,28 +327,26 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
     BlockDriverState *source = s->source;
-    int64_t sector_num, first_chunk;
+    int64_t offset, first_chunk;
     uint64_t delay_ns = 0;
     /* At least the first dirty chunk is mirrored in one iteration. */
     int nb_chunks = 1;
-    int64_t end = s->bdev_length / BDRV_SECTOR_SIZE;
     int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
     bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
     int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);

-    sector_num = bdrv_dirty_iter_next(s->dbi);
-    if (sector_num < 0) {
+    offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+    if (offset < 0) {
         bdrv_set_dirty_iter(s->dbi, 0);
-        sector_num = bdrv_dirty_iter_next(s->dbi);
+        offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
         trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
                                   BDRV_SECTOR_SIZE);
-        assert(sector_num >= 0);
+        assert(offset >= 0);
     }

-    first_chunk = sector_num / sectors_per_chunk;
+    first_chunk = offset / s->granularity;
     while (test_bit(first_chunk, s->in_flight_bitmap)) {
-        trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
-                                     s->in_flight);
+        trace_mirror_yield_in_flight(s, offset, s->in_flight);
         mirror_wait_for_io(s);
     }

@@ -365,25 +354,26 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)

     /* Find the number of consective dirty chunks following the first dirty
      * one, and wait for in flight requests in them. */
-    while (nb_chunks * sectors_per_chunk < (s->buf_size >> BDRV_SECTOR_BITS)) {
+    while (nb_chunks * s->granularity < s->buf_size) {
         int64_t next_dirty;
-        int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk;
-        int64_t next_chunk = next_sector / sectors_per_chunk;
-        if (next_sector >= end ||
-            !bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
+        int64_t next_offset = offset + nb_chunks * s->granularity;
+        int64_t next_chunk = next_offset / s->granularity;
+        if (next_offset >= s->bdev_length ||
+            !bdrv_get_dirty(source, s->dirty_bitmap,
+                            next_offset >> BDRV_SECTOR_BITS)) {
             break;
         }
         if (test_bit(next_chunk, s->in_flight_bitmap)) {
             break;
         }

-        next_dirty = bdrv_dirty_iter_next(s->dbi);
-        if (next_dirty > next_sector || next_dirty < 0) {
+        next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+        if (next_dirty > next_offset || next_dirty < 0) {
             /* The bitmap iterator's cache is stale, refresh it */
-            bdrv_set_dirty_iter(s->dbi, next_sector);
-            next_dirty = bdrv_dirty_iter_next(s->dbi);
+            bdrv_set_dirty_iter(s->dbi, next_offset >> BDRV_SECTOR_BITS);
+            next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
         }
-        assert(next_dirty == next_sector);
+        assert(next_dirty == next_offset);
         nb_chunks++;
     }

@@ -391,12 +381,13 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
      * calling bdrv_get_block_status_above could yield - if some blocks are
      * marked dirty in this window, we need to know.
      */
-    bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num,
+    bdrv_reset_dirty_bitmap(s->dirty_bitmap, offset >> BDRV_SECTOR_BITS,
                             nb_chunks * sectors_per_chunk);
-    bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
-    while (nb_chunks > 0 && sector_num < end) {
+    bitmap_set(s->in_flight_bitmap, offset / s->granularity, nb_chunks);
+    while (nb_chunks > 0 && offset < s->bdev_length) {
         int64_t ret;
         int io_sectors;
+        unsigned int io_bytes;
         int64_t io_bytes_acct;
         BlockDriverState *file;
         enum MirrorMethod {
@@ -405,28 +396,28 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             MIRROR_METHOD_DISCARD
         } mirror_method = MIRROR_METHOD_COPY;

-        assert(!(sector_num % sectors_per_chunk));
-        ret = bdrv_get_block_status_above(source, NULL, sector_num,
+        assert(!(offset % s->granularity));
+        ret = bdrv_get_block_status_above(source, NULL,
+                                          offset >> BDRV_SECTOR_BITS,
                                           nb_chunks * sectors_per_chunk,
                                           &io_sectors, &file);
+        io_bytes = io_sectors * BDRV_SECTOR_SIZE;
         if (ret < 0) {
-            io_sectors = MIN(nb_chunks * sectors_per_chunk,
-                             max_io_bytes >> BDRV_SECTOR_BITS);
+            io_bytes = MIN(nb_chunks * s->granularity, max_io_bytes);
         } else if (ret & BDRV_BLOCK_DATA) {
-            io_sectors = MIN(io_sectors, max_io_bytes >> BDRV_SECTOR_BITS);
+            io_bytes = MIN(io_bytes, max_io_bytes);
         }

-        io_sectors -= io_sectors % sectors_per_chunk;
-        if (io_sectors < sectors_per_chunk) {
-            io_sectors = sectors_per_chunk;
+        io_bytes -= io_bytes % s->granularity;
+        if (io_bytes < s->granularity) {
+            io_bytes = s->granularity;
         } else if (ret >= 0 && !(ret & BDRV_BLOCK_DATA)) {
-            int64_t target_sector_num;
-            int target_nb_sectors;
-            bdrv_round_sectors_to_clusters(blk_bs(s->target), sector_num,
-                                           io_sectors,  &target_sector_num,
-                                           &target_nb_sectors);
-            if (target_sector_num == sector_num &&
-                target_nb_sectors == io_sectors) {
+            int64_t target_offset;
+            unsigned int target_bytes;
+            bdrv_round_to_clusters(blk_bs(s->target), offset, io_bytes,
+                                   &target_offset, &target_bytes);
+            if (target_offset == offset &&
+                target_bytes == io_bytes) {
                 mirror_method = ret & BDRV_BLOCK_ZERO ?
                                     MIRROR_METHOD_ZERO :
                                     MIRROR_METHOD_DISCARD;
@@ -434,8 +425,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         }

         while (s->in_flight >= MAX_IN_FLIGHT) {
-            trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
-                                         s->in_flight);
+            trace_mirror_yield_in_flight(s, offset, s->in_flight);
             mirror_wait_for_io(s);
         }

@@ -443,30 +433,27 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             return 0;
         }

-        mirror_clip_sectors(s, sector_num, &io_sectors);
+        mirror_clip_bytes(s, offset, &io_bytes);
         switch (mirror_method) {
         case MIRROR_METHOD_COPY:
-            io_bytes_acct = mirror_do_read(s, sector_num * BDRV_SECTOR_SIZE,
-                                           io_sectors * BDRV_SECTOR_SIZE);
-            io_sectors = io_bytes_acct / BDRV_SECTOR_SIZE;
+            io_bytes = io_bytes_acct = mirror_do_read(s, offset, io_bytes);
             break;
         case MIRROR_METHOD_ZERO:
         case MIRROR_METHOD_DISCARD:
-            mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
-                                      io_sectors * BDRV_SECTOR_SIZE,
+            mirror_do_zero_or_discard(s, offset, io_bytes,
                                       mirror_method == MIRROR_METHOD_DISCARD);
             if (write_zeroes_ok) {
                 io_bytes_acct = 0;
             } else {
-                io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
+                io_bytes_acct = io_bytes;
             }
             break;
         default:
             abort();
         }
-        assert(io_sectors);
-        sector_num += io_sectors;
-        nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk);
+        assert(io_bytes);
+        offset += io_bytes;
+        nb_chunks -= DIV_ROUND_UP(io_bytes, s->granularity);
         if (s->common.speed) {
             delay_ns = ratelimit_calculate_delay(&s->limit, io_bytes_acct);
         }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 12/17] backup: Switch BackupBlockJob to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (10 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 11/17] mirror: Switch mirror_iteration() " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 13/17] backup: Switch block_backup.h " Eric Blake
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting an
internal structure (no semantic change), and all references to
tracking progress.  Drop a redundant local variable bytes_per_cluster.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/backup.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index b28df8c..a64a162 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -39,7 +39,7 @@ typedef struct BackupBlockJob {
     BlockdevOnError on_source_error;
     BlockdevOnError on_target_error;
     CoRwlock flush_rwlock;
-    uint64_t sectors_read;
+    uint64_t bytes_read;
     unsigned long *done_bitmap;
     int64_t cluster_size;
     bool compress;
@@ -102,16 +102,15 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     void *bounce_buffer = NULL;
     int ret = 0;
     int64_t sectors_per_cluster = cluster_size_sectors(job);
-    int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
-    int64_t start, end;
-    int n;
+    int64_t start, end; /* clusters */
+    int n; /* bytes */

     qemu_co_rwlock_rdlock(&job->flush_rwlock);

     start = sector_num / sectors_per_cluster;
     end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);

-    trace_backup_do_cow_enter(job, start * bytes_per_cluster,
+    trace_backup_do_cow_enter(job, start * job->cluster_size,
                               sector_num * BDRV_SECTOR_SIZE,
                               nb_sectors * BDRV_SECTOR_SIZE);

@@ -120,28 +119,27 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,

     for (; start < end; start++) {
         if (test_bit(start, job->done_bitmap)) {
-            trace_backup_do_cow_skip(job, start * bytes_per_cluster);
+            trace_backup_do_cow_skip(job, start * job->cluster_size);
             continue; /* already copied */
         }

-        trace_backup_do_cow_process(job, start * bytes_per_cluster);
+        trace_backup_do_cow_process(job, start * job->cluster_size);

-        n = MIN(sectors_per_cluster,
-                job->common.len / BDRV_SECTOR_SIZE -
-                start * sectors_per_cluster);
+        n = MIN(job->cluster_size,
+                job->common.len - start * job->cluster_size);

         if (!bounce_buffer) {
             bounce_buffer = blk_blockalign(blk, job->cluster_size);
         }
         iov.iov_base = bounce_buffer;
-        iov.iov_len = n * BDRV_SECTOR_SIZE;
+        iov.iov_len = n;
         qemu_iovec_init_external(&bounce_qiov, &iov, 1);

         ret = blk_co_preadv(blk, start * job->cluster_size,
                             bounce_qiov.size, &bounce_qiov,
                             is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
         if (ret < 0) {
-            trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, ret);
+            trace_backup_do_cow_read_fail(job, start * job->cluster_size, ret);
             if (error_is_read) {
                 *error_is_read = true;
             }
@@ -157,7 +155,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                                  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
         }
         if (ret < 0) {
-            trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, ret);
+            trace_backup_do_cow_write_fail(job, start * job->cluster_size, ret);
             if (error_is_read) {
                 *error_is_read = false;
             }
@@ -169,8 +167,8 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
         /* Publish progress, guest I/O counts as progress too.  Note that the
          * offset field is an opaque progress value, it is not a disk offset.
          */
-        job->sectors_read += n;
-        job->common.offset += n * BDRV_SECTOR_SIZE;
+        job->bytes_read += n;
+        job->common.offset += n;
     }

 out:
@@ -363,9 +361,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
      */
     if (job->common.speed) {
         uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
-                                                      job->sectors_read *
-                                                      BDRV_SECTOR_SIZE);
-        job->sectors_read = 0;
+                                                      job->bytes_read);
+        job->bytes_read = 0;
         block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
     } else {
         block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 13/17] backup: Switch block_backup.h to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (11 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 12/17] backup: Switch BackupBlockJob " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-17 23:24   ` [Qemu-devel] [Qemu-block] " John Snow
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 14/17] backup: Switch backup_do_cow() " Eric Blake
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Jeff Cody, Max Reitz, Wen Congyang, Changlong Xie

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting
the public interface to backup jobs (no semantic change), including
a change to CowRequest to track by bytes instead of cluster indices.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/block_backup.h | 11 +++++------
 block/backup.c               | 29 ++++++++++++++---------------
 block/replication.c          | 12 ++++++++----
 3 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/include/block/block_backup.h b/include/block/block_backup.h
index 8a75947..994a3bd 100644
--- a/include/block/block_backup.h
+++ b/include/block/block_backup.h
@@ -21,17 +21,16 @@
 #include "block/block_int.h"

 typedef struct CowRequest {
-    int64_t start;
-    int64_t end;
+    int64_t start_byte;
+    int64_t end_byte;
     QLIST_ENTRY(CowRequest) list;
     CoQueue wait_queue; /* coroutines blocked on this request */
 } CowRequest;

-void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
-                                          int nb_sectors);
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
+                                          uint64_t bytes);
 void backup_cow_request_begin(CowRequest *req, BlockJob *job,
-                              int64_t sector_num,
-                              int nb_sectors);
+                              int64_t offset, uint64_t bytes);
 void backup_cow_request_end(CowRequest *req);

 void backup_do_checkpoint(BlockJob *job, Error **errp);
diff --git a/block/backup.c b/block/backup.c
index a64a162..0502c1a 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -64,7 +64,7 @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
     do {
         retry = false;
         QLIST_FOREACH(req, &job->inflight_reqs, list) {
-            if (end > req->start && start < req->end) {
+            if (end > req->start_byte && start < req->end_byte) {
                 qemu_co_queue_wait(&req->wait_queue, NULL);
                 retry = true;
                 break;
@@ -77,8 +77,8 @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
 static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
                                      int64_t start, int64_t end)
 {
-    req->start = start;
-    req->end = end;
+    req->start_byte = start;
+    req->end_byte = end;
     qemu_co_queue_init(&req->wait_queue);
     QLIST_INSERT_HEAD(&job->inflight_reqs, req, list);
 }
@@ -114,8 +114,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                               sector_num * BDRV_SECTOR_SIZE,
                               nb_sectors * BDRV_SECTOR_SIZE);

-    wait_for_overlapping_requests(job, start, end);
-    cow_request_begin(&cow_request, job, start, end);
+    wait_for_overlapping_requests(job, start * job->cluster_size,
+                                  end * job->cluster_size);
+    cow_request_begin(&cow_request, job, start * job->cluster_size,
+                      end * job->cluster_size);

     for (; start < end; start++) {
         if (test_bit(start, job->done_bitmap)) {
@@ -277,32 +279,29 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
     bitmap_zero(backup_job->done_bitmap, len);
 }

-void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
-                                          int nb_sectors)
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
+                                          uint64_t bytes)
 {
     BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
-    int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
     int64_t start, end;

     assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);

-    start = sector_num / sectors_per_cluster;
-    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+    start = QEMU_ALIGN_DOWN(offset, backup_job->cluster_size);
+    end = QEMU_ALIGN_UP(offset + bytes, backup_job->cluster_size);
     wait_for_overlapping_requests(backup_job, start, end);
 }

 void backup_cow_request_begin(CowRequest *req, BlockJob *job,
-                              int64_t sector_num,
-                              int nb_sectors)
+                              int64_t offset, uint64_t bytes)
 {
     BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
-    int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
     int64_t start, end;

     assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);

-    start = sector_num / sectors_per_cluster;
-    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+    start = QEMU_ALIGN_DOWN(offset, backup_job->cluster_size);
+    end = QEMU_ALIGN_UP(offset + bytes, backup_job->cluster_size);
     cow_request_begin(req, backup_job, start, end);
 }

diff --git a/block/replication.c b/block/replication.c
index bf3c395..414ecc4 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -234,10 +234,14 @@ static coroutine_fn int replication_co_readv(BlockDriverState *bs,
     }

     if (job) {
-        backup_wait_for_overlapping_requests(child->bs->job, sector_num,
-                                             remaining_sectors);
-        backup_cow_request_begin(&req, child->bs->job, sector_num,
-                                 remaining_sectors);
+        uint64_t remaining_bytes = remaining_sectors * BDRV_SECTOR_SIZE;
+
+        backup_wait_for_overlapping_requests(child->bs->job,
+                                             sector_num * BDRV_SECTOR_SIZE,
+                                             remaining_bytes);
+        backup_cow_request_begin(&req, child->bs->job,
+                                 sector_num * BDRV_SECTOR_SIZE,
+                                 remaining_bytes);
         ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors,
                             qiov);
         backup_cow_request_end(&req);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 14/17] backup: Switch backup_do_cow() to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (12 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 13/17] backup: Switch block_backup.h " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 15/17] backup: Switch backup_run() " Eric Blake
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/backup.c | 62 ++++++++++++++++++++++++----------------------------------
 1 file changed, 26 insertions(+), 36 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 0502c1a..0e87b22 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -91,7 +91,7 @@ static void cow_request_end(CowRequest *req)
 }

 static int coroutine_fn backup_do_cow(BackupBlockJob *job,
-                                      int64_t sector_num, int nb_sectors,
+                                      int64_t offset, uint64_t bytes,
                                       bool *error_is_read,
                                       bool is_write_notifier)
 {
@@ -101,34 +101,28 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     QEMUIOVector bounce_qiov;
     void *bounce_buffer = NULL;
     int ret = 0;
-    int64_t sectors_per_cluster = cluster_size_sectors(job);
-    int64_t start, end; /* clusters */
+    int64_t start, end; /* bytes */
     int n; /* bytes */

     qemu_co_rwlock_rdlock(&job->flush_rwlock);

-    start = sector_num / sectors_per_cluster;
-    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+    start = QEMU_ALIGN_DOWN(offset, job->cluster_size);
+    end = QEMU_ALIGN_UP(bytes + offset, job->cluster_size);

-    trace_backup_do_cow_enter(job, start * job->cluster_size,
-                              sector_num * BDRV_SECTOR_SIZE,
-                              nb_sectors * BDRV_SECTOR_SIZE);
+    trace_backup_do_cow_enter(job, start, offset, bytes);

-    wait_for_overlapping_requests(job, start * job->cluster_size,
-                                  end * job->cluster_size);
-    cow_request_begin(&cow_request, job, start * job->cluster_size,
-                      end * job->cluster_size);
+    wait_for_overlapping_requests(job, start, end);
+    cow_request_begin(&cow_request, job, start, end);

-    for (; start < end; start++) {
-        if (test_bit(start, job->done_bitmap)) {
-            trace_backup_do_cow_skip(job, start * job->cluster_size);
+    for (; start < end; start += job->cluster_size) {
+        if (test_bit(start / job->cluster_size, job->done_bitmap)) {
+            trace_backup_do_cow_skip(job, start);
             continue; /* already copied */
         }

-        trace_backup_do_cow_process(job, start * job->cluster_size);
+        trace_backup_do_cow_process(job, start);

-        n = MIN(job->cluster_size,
-                job->common.len - start * job->cluster_size);
+        n = MIN(job->cluster_size, job->common.len - start);

         if (!bounce_buffer) {
             bounce_buffer = blk_blockalign(blk, job->cluster_size);
@@ -137,11 +131,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
         iov.iov_len = n;
         qemu_iovec_init_external(&bounce_qiov, &iov, 1);

-        ret = blk_co_preadv(blk, start * job->cluster_size,
-                            bounce_qiov.size, &bounce_qiov,
+        ret = blk_co_preadv(blk, start, bounce_qiov.size, &bounce_qiov,
                             is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
         if (ret < 0) {
-            trace_backup_do_cow_read_fail(job, start * job->cluster_size, ret);
+            trace_backup_do_cow_read_fail(job, start, ret);
             if (error_is_read) {
                 *error_is_read = true;
             }
@@ -149,22 +142,22 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
         }

         if (buffer_is_zero(iov.iov_base, iov.iov_len)) {
-            ret = blk_co_pwrite_zeroes(job->target, start * job->cluster_size,
+            ret = blk_co_pwrite_zeroes(job->target, start,
                                        bounce_qiov.size, BDRV_REQ_MAY_UNMAP);
         } else {
-            ret = blk_co_pwritev(job->target, start * job->cluster_size,
+            ret = blk_co_pwritev(job->target, start,
                                  bounce_qiov.size, &bounce_qiov,
                                  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
         }
         if (ret < 0) {
-            trace_backup_do_cow_write_fail(job, start * job->cluster_size, ret);
+            trace_backup_do_cow_write_fail(job, start, ret);
             if (error_is_read) {
                 *error_is_read = false;
             }
             goto out;
         }

-        set_bit(start, job->done_bitmap);
+        set_bit(start / job->cluster_size, job->done_bitmap);

         /* Publish progress, guest I/O counts as progress too.  Note that the
          * offset field is an opaque progress value, it is not a disk offset.
@@ -180,8 +173,7 @@ out:

     cow_request_end(&cow_request);

-    trace_backup_do_cow_return(job, sector_num * BDRV_SECTOR_SIZE,
-                               nb_sectors * BDRV_SECTOR_SIZE, ret);
+    trace_backup_do_cow_return(job, offset, bytes, ret);

     qemu_co_rwlock_unlock(&job->flush_rwlock);

@@ -194,14 +186,12 @@ static int coroutine_fn backup_before_write_notify(
 {
     BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
     BdrvTrackedRequest *req = opaque;
-    int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
-    int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;

     assert(req->bs == blk_bs(job->common.blk));
-    assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-    assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+    assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
+    assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));

-    return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
+    return backup_do_cow(job, req->offset, req->bytes, NULL, true);
 }

 static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
@@ -406,8 +396,8 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
                 if (yield_and_check(job)) {
                     goto out;
                 }
-                ret = backup_do_cow(job, cluster * sectors_per_cluster,
-                                    sectors_per_cluster, &error_is_read,
+                ret = backup_do_cow(job, cluster * job->cluster_size,
+                                    job->cluster_size, &error_is_read,
                                     false);
                 if ((ret < 0) &&
                     backup_error_action(job, error_is_read, -ret) ==
@@ -509,8 +499,8 @@ static void coroutine_fn backup_run(void *opaque)
             if (alloced < 0) {
                 ret = alloced;
             } else {
-                ret = backup_do_cow(job, start * sectors_per_cluster,
-                                    sectors_per_cluster, &error_is_read,
+                ret = backup_do_cow(job, start * job->cluster_size,
+                                    job->cluster_size, &error_is_read,
                                     false);
             }
             if (ret < 0) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 15/17] backup: Switch backup_run() to byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (13 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 14/17] backup: Switch backup_do_cow() " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based Eric Blake
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of backups to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are cluster-aligned).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/backup.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 0e87b22..2a51e8b 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -370,11 +370,10 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
     int ret = 0;
     int clusters_per_iter;
     uint32_t granularity;
-    int64_t sector;
+    int64_t offset;
     int64_t cluster;
     int64_t end;
     int64_t last_cluster = -1;
-    int64_t sectors_per_cluster = cluster_size_sectors(job);
     BdrvDirtyBitmapIter *dbi;

     granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
@@ -382,8 +381,8 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
     dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);

     /* Find the next dirty sector(s) */
-    while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
-        cluster = sector / sectors_per_cluster;
+    while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
+        cluster = offset / job->cluster_size;

         /* Fake progress updates for any clusters we skipped */
         if (cluster != last_cluster + 1) {
@@ -410,7 +409,8 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
         /* If the bitmap granularity is smaller than the backup granularity,
          * we need to advance the iterator pointer to the next cluster. */
         if (granularity < job->cluster_size) {
-            bdrv_set_dirty_iter(dbi, cluster * sectors_per_cluster);
+            bdrv_set_dirty_iter(dbi,
+                                cluster * job->cluster_size / BDRV_SECTOR_SIZE);
         }

         last_cluster = cluster - 1;
@@ -432,17 +432,15 @@ static void coroutine_fn backup_run(void *opaque)
     BackupBlockJob *job = opaque;
     BackupCompleteData *data;
     BlockDriverState *bs = blk_bs(job->common.blk);
-    int64_t start, end;
+    int64_t offset;
     int64_t sectors_per_cluster = cluster_size_sectors(job);
     int ret = 0;

     QLIST_INIT(&job->inflight_reqs);
     qemu_co_rwlock_init(&job->flush_rwlock);

-    start = 0;
-    end = DIV_ROUND_UP(job->common.len, job->cluster_size);
-
-    job->done_bitmap = bitmap_new(end);
+    job->done_bitmap = bitmap_new(DIV_ROUND_UP(job->common.len,
+                                               job->cluster_size));

     job->before_write.notify = backup_before_write_notify;
     bdrv_add_before_write_notifier(bs, &job->before_write);
@@ -457,7 +455,8 @@ static void coroutine_fn backup_run(void *opaque)
         ret = backup_run_incremental(job);
     } else {
         /* Both FULL and TOP SYNC_MODE's require copying.. */
-        for (; start < end; start++) {
+        for (offset = 0; offset < job->common.len;
+             offset += job->cluster_size) {
             bool error_is_read;
             int alloced = 0;

@@ -480,8 +479,8 @@ static void coroutine_fn backup_run(void *opaque)
                      * needed but at some point that is always the case. */
                     alloced =
                         bdrv_is_allocated(bs,
-                                start * sectors_per_cluster + i,
-                                sectors_per_cluster - i, &n);
+                                          (offset >> BDRV_SECTOR_BITS) + i,
+                                          sectors_per_cluster - i, &n);
                     i += n;

                     if (alloced || n == 0) {
@@ -499,9 +498,8 @@ static void coroutine_fn backup_run(void *opaque)
             if (alloced < 0) {
                 ret = alloced;
             } else {
-                ret = backup_do_cow(job, start * job->cluster_size,
-                                    job->cluster_size, &error_is_read,
-                                    false);
+                ret = backup_do_cow(job, offset, job->cluster_size,
+                                    &error_is_read, false);
             }
             if (ret < 0) {
                 /* Depending on error action, fail now or retry cluster */
@@ -510,7 +508,7 @@ static void coroutine_fn backup_run(void *opaque)
                 if (action == BLOCK_ERROR_ACTION_REPORT) {
                     break;
                 } else {
-                    start--;
+                    offset -= job->cluster_size;
                     continue;
                 }
             }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (14 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 15/17] backup: Switch backup_run() " Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-18 22:15   ` [Qemu-devel] [Qemu-block] " John Snow
  2017-04-19 20:32   ` John Snow
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based Eric Blake
  2017-04-17 23:42 ` [Qemu-devel] [Qemu-block] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based John Snow
  17 siblings, 2 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Jeff Cody, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Juan Quintela, Dr. David Alan Gilbert

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, for the most part this patch is just the
addition of scaling at the callers followed by inverse scaling at
bdrv_is_allocated().  But some code, particularly bdrv_commit(),
gets a lot simpler because it no longer has to mess with sectors;
also, it is now possible to pass NULL if the caller does not care
how much of the image is allocated beyond the initial offset.

For ease of review, bdrv_is_allocated_above() will be tackled
separately.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/block.h |  4 ++--
 block/backup.c        | 17 +++++----------
 block/commit.c        | 21 ++++++++-----------
 block/io.c            | 49 +++++++++++++++++++++++++++++--------------
 block/stream.c        |  5 +++--
 block/vvfat.c         | 34 +++++++++++++++++-------------
 migration/block.c     |  9 ++++----
 qemu-img.c            |  5 ++++-
 qemu-io-cmds.c        | 57 +++++++++++++++++++++++++--------------------------
 9 files changed, 110 insertions(+), 91 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index b5289f7..8641149 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -422,8 +422,8 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
                                     int64_t sector_num,
                                     int nb_sectors, int *pnum,
                                     BlockDriverState **file);
-int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
-                      int *pnum);
+int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
+                      int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
                             int64_t sector_num, int nb_sectors, int *pnum);

diff --git a/block/backup.c b/block/backup.c
index 2a51e8b..63ca208 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -47,12 +47,6 @@ typedef struct BackupBlockJob {
     QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;

-/* Size of a cluster in sectors, instead of bytes. */
-static inline int64_t cluster_size_sectors(BackupBlockJob *job)
-{
-  return job->cluster_size / BDRV_SECTOR_SIZE;
-}
-
 /* See if in-flight requests overlap and wait for them to complete */
 static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
                                                        int64_t start,
@@ -433,7 +427,6 @@ static void coroutine_fn backup_run(void *opaque)
     BackupCompleteData *data;
     BlockDriverState *bs = blk_bs(job->common.blk);
     int64_t offset;
-    int64_t sectors_per_cluster = cluster_size_sectors(job);
     int ret = 0;

     QLIST_INIT(&job->inflight_reqs);
@@ -465,12 +458,13 @@ static void coroutine_fn backup_run(void *opaque)
             }

             if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
-                int i, n;
+                int i;
+                int64_t n;

                 /* Check to see if these blocks are already in the
                  * backing file. */

-                for (i = 0; i < sectors_per_cluster;) {
+                for (i = 0; i < job->cluster_size;) {
                     /* bdrv_is_allocated() only returns true/false based
                      * on the first set of sectors it comes across that
                      * are are all in the same state.
@@ -478,9 +472,8 @@ static void coroutine_fn backup_run(void *opaque)
                      * backup cluster length.  We end up copying more than
                      * needed but at some point that is always the case. */
                     alloced =
-                        bdrv_is_allocated(bs,
-                                          (offset >> BDRV_SECTOR_BITS) + i,
-                                          sectors_per_cluster - i, &n);
+                        bdrv_is_allocated(bs, offset + i,
+                                          job->cluster_size - i, &n);
                     i += n;

                     if (alloced || n == 0) {
diff --git a/block/commit.c b/block/commit.c
index b7d3e60..4d6bb2a 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -429,7 +429,7 @@ fail:
 }


-#define COMMIT_BUF_SECTORS 2048
+#define COMMIT_BUF_SIZE (2048 * BDRV_SECTOR_SIZE)

 /* commit COW file into the raw image */
 int bdrv_commit(BlockDriverState *bs)
@@ -438,8 +438,9 @@ int bdrv_commit(BlockDriverState *bs)
     BlockDriverState *backing_file_bs = NULL;
     BlockDriverState *commit_top_bs = NULL;
     BlockDriver *drv = bs->drv;
-    int64_t sector, total_sectors, length, backing_length;
-    int n, ro, open_flags;
+    int64_t offset, length, backing_length;
+    int ro, open_flags;
+    int64_t n;
     int ret = 0;
     uint8_t *buf = NULL;
     Error *local_err = NULL;
@@ -517,30 +518,26 @@ int bdrv_commit(BlockDriverState *bs)
         }
     }

-    total_sectors = length >> BDRV_SECTOR_BITS;
-
     /* blk_try_blockalign() for src will choose an alignment that works for
      * backing as well, so no need to compare the alignment manually. */
-    buf = blk_try_blockalign(src, COMMIT_BUF_SECTORS * BDRV_SECTOR_SIZE);
+    buf = blk_try_blockalign(src, COMMIT_BUF_SIZE);
     if (buf == NULL) {
         ret = -ENOMEM;
         goto ro_cleanup;
     }

-    for (sector = 0; sector < total_sectors; sector += n) {
-        ret = bdrv_is_allocated(bs, sector, COMMIT_BUF_SECTORS, &n);
+    for (offset = 0; offset < length; offset += n) {
+        ret = bdrv_is_allocated(bs, offset, COMMIT_BUF_SIZE, &n);
         if (ret < 0) {
             goto ro_cleanup;
         }
         if (ret) {
-            ret = blk_pread(src, sector * BDRV_SECTOR_SIZE, buf,
-                            n * BDRV_SECTOR_SIZE);
+            ret = blk_pread(src, offset, buf, n);
             if (ret < 0) {
                 goto ro_cleanup;
             }

-            ret = blk_pwrite(backing, sector * BDRV_SECTOR_SIZE, buf,
-                             n * BDRV_SECTOR_SIZE, 0);
+            ret = blk_pwrite(backing, offset, buf, n, 0);
             if (ret < 0) {
                 goto ro_cleanup;
             }
diff --git a/block/io.c b/block/io.c
index 8706bfa..438a493 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1054,14 +1054,15 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
         int64_t start_sector = offset >> BDRV_SECTOR_BITS;
         int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
         unsigned int nb_sectors = end_sector - start_sector;
-        int pnum;
+        int64_t pnum;

-        ret = bdrv_is_allocated(bs, start_sector, nb_sectors, &pnum);
+        ret = bdrv_is_allocated(bs, start_sector << BDRV_SECTOR_BITS,
+                                nb_sectors << BDRV_SECTOR_BITS, &pnum);
         if (ret < 0) {
             goto out;
         }

-        if (!ret || pnum != nb_sectors) {
+        if (!ret || pnum != nb_sectors << BDRV_SECTOR_BITS) {
             ret = bdrv_co_do_copy_on_readv(child, offset, bytes, qiov);
             goto out;
         }
@@ -1903,15 +1904,26 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
                                        sector_num, nb_sectors, pnum, file);
 }

-int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
-                                   int nb_sectors, int *pnum)
+int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
+                                   int64_t bytes, int64_t *pnum)
 {
     BlockDriverState *file;
-    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum,
-                                        &file);
+    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
+    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
+    int64_t ret;
+    int psectors;
+
+    assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
+           bytes < INT_MAX * BDRV_SECTOR_SIZE);
+    ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
+                                &file);
     if (ret < 0) {
         return ret;
     }
+    if (pnum) {
+        *pnum = psectors * BDRV_SECTOR_SIZE;
+    }
     return !!(ret & BDRV_BLOCK_ALLOCATED);
 }

@@ -1920,7 +1932,8 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
  *
  * Return true if the given sector is allocated in any image between
  * BASE and TOP (inclusive).  BASE can be NULL to check if the given
- * sector is allocated in any image of the chain.  Return false otherwise.
+ * sector is allocated in any image of the chain.  Return false otherwise,
+ * or negative errno on failure.
  *
  * 'pnum' is set to the number of sectors (including and immediately following
  *  the specified sector) that are known to be in the same
@@ -1937,13 +1950,19 @@ int bdrv_is_allocated_above(BlockDriverState *top,

     intermediate = top;
     while (intermediate && intermediate != base) {
-        int pnum_inter;
-        ret = bdrv_is_allocated(intermediate, sector_num, nb_sectors,
+        int64_t pnum_inter;
+        int psectors_inter;
+
+        ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
+                                nb_sectors * BDRV_SECTOR_SIZE,
                                 &pnum_inter);
         if (ret < 0) {
             return ret;
-        } else if (ret) {
-            *pnum = pnum_inter;
+        }
+        assert(pnum_inter < INT_MAX * BDRV_SECTOR_SIZE);
+        psectors_inter = pnum_inter >> BDRV_SECTOR_BITS;
+        if (ret) {
+            *pnum = psectors_inter;
             return 1;
         }

@@ -1953,10 +1972,10 @@ int bdrv_is_allocated_above(BlockDriverState *top,
          *
          * [sector_num+x, nr_sectors] allocated.
          */
-        if (n > pnum_inter &&
+        if (n > psectors_inter &&
             (intermediate == top ||
-             sector_num + pnum_inter < intermediate->total_sectors)) {
-            n = pnum_inter;
+             sector_num + psectors_inter < intermediate->total_sectors)) {
+            n = psectors_inter;
         }

         intermediate = backing_bs(intermediate);
diff --git a/block/stream.c b/block/stream.c
index 3ed4fef..85502eb 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -138,6 +138,7 @@ static void coroutine_fn stream_run(void *opaque)

     for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
         bool copy;
+        int64_t count = 0;

         /* Note that even when no rate limit is applied we need to yield
          * with no pending I/O here so that bdrv_drain_all() returns.
@@ -149,8 +150,8 @@ static void coroutine_fn stream_run(void *opaque)

         copy = false;

-        ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
-                                STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
+        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count);
+        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
         if (ret == 1) {
             /* Allocated in the top, no need to copy.  */
         } else if (ret >= 0) {
diff --git a/block/vvfat.c b/block/vvfat.c
index af5153d..bef2056 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1393,24 +1393,27 @@ static int vvfat_read(BlockDriverState *bs, int64_t sector_num,
 	if (sector_num >= bs->total_sectors)
 	   return -1;
 	if (s->qcow) {
-	    int n;
+            int64_t n;
             int ret;
-            ret = bdrv_is_allocated(s->qcow->bs, sector_num,
-                                    nb_sectors - i, &n);
+            ret = bdrv_is_allocated(s->qcow->bs, sector_num * BDRV_SECTOR_SIZE,
+                                    (nb_sectors - i) * BDRV_SECTOR_SIZE, &n);
             if (ret < 0) {
                 return ret;
             }
             if (ret) {
-                DLOG(fprintf(stderr, "sectors %d+%d allocated\n",
-                             (int)sector_num, n));
-                if (bdrv_read(s->qcow, sector_num, buf + i * 0x200, n)) {
+                DLOG(fprintf(stderr, "sectors %" PRId64 "+%" PRId64
+                             " allocated\n", sector_num,
+                             n >> BDRV_SECTOR_BITS));
+                if (bdrv_read(s->qcow, sector_num, buf + i * 0x200,
+                              n >> BDRV_SECTOR_BITS)) {
                     return -1;
                 }
-                i += n - 1;
-                sector_num += n - 1;
+                i += (n >> BDRV_SECTOR_BITS) - 1;
+                sector_num += (n >> BDRV_SECTOR_BITS) - 1;
                 continue;
             }
-DLOG(fprintf(stderr, "sector %d not allocated\n", (int)sector_num));
+            DLOG(fprintf(stderr, "sector %" PRId64 " not allocated\n",
+                         sector_num));
 	}
 	if(sector_num<s->faked_sectors) {
 	    if(sector_num<s->first_sectors_number)
@@ -1678,7 +1681,7 @@ static inline bool cluster_was_modified(BDRVVVFATState *s,
                                         uint32_t cluster_num)
 {
     int was_modified = 0;
-    int i, dummy;
+    int i;

     if (s->qcow == NULL) {
         return 0;
@@ -1686,8 +1689,9 @@ static inline bool cluster_was_modified(BDRVVVFATState *s,

     for (i = 0; !was_modified && i < s->sectors_per_cluster; i++) {
         was_modified = bdrv_is_allocated(s->qcow->bs,
-                                         cluster2sector(s, cluster_num) + i,
-                                         1, &dummy);
+                                         (cluster2sector(s, cluster_num) +
+                                          i) * BDRV_SECTOR_SIZE,
+                                         BDRV_SECTOR_SIZE, NULL);
     }

     /*
@@ -1834,7 +1838,7 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s,
 	    }

 	    if (copy_it) {
-		int i, dummy;
+                int i;
 		/*
 		 * This is horribly inefficient, but that is okay, since
 		 * it is rarely executed, if at all.
@@ -1845,7 +1849,9 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s,
                 for (i = 0; i < s->sectors_per_cluster; i++) {
                     int res;

-                    res = bdrv_is_allocated(s->qcow->bs, offset + i, 1, &dummy);
+                    res = bdrv_is_allocated(s->qcow->bs,
+                                            (offset + i) * BDRV_SECTOR_SIZE,
+                                            BDRV_SECTOR_SIZE, NULL);
                     if (res < 0) {
                         return -1;
                     }
diff --git a/migration/block.c b/migration/block.c
index 7734ff7..18d50ff 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -36,7 +36,7 @@
 #define BLK_MIG_FLAG_PROGRESS           0x04
 #define BLK_MIG_FLAG_ZERO_BLOCK         0x08

-#define MAX_IS_ALLOCATED_SEARCH 65536
+#define MAX_IS_ALLOCATED_SEARCH (65536 * BDRV_SECTOR_SIZE)

 #define MAX_INFLIGHT_IO 512

@@ -272,6 +272,7 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
     BlockBackend *bb = bmds->blk;
     BlkMigBlock *blk;
     int nr_sectors;
+    int64_t count;

     if (bmds->shared_base) {
         qemu_mutex_lock_iothread();
@@ -279,9 +280,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
         /* Skip unallocated sectors; intentionally treats failure as
          * an allocated sector */
         while (cur_sector < total_sectors &&
-               !bdrv_is_allocated(blk_bs(bb), cur_sector,
-                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
-            cur_sector += nr_sectors;
+               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
+                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
+            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
         }
         aio_context_release(blk_get_aio_context(bb));
         qemu_mutex_unlock_iothread();
diff --git a/qemu-img.c b/qemu-img.c
index 37c2894..2f21d69 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3207,6 +3207,7 @@ static int img_rebase(int argc, char **argv)
         int64_t new_backing_num_sectors = 0;
         uint64_t sector;
         int n;
+        int64_t count;
         float local_progress = 0;

         buf_old = blk_blockalign(blk, IO_BUF_SIZE);
@@ -3254,12 +3255,14 @@ static int img_rebase(int argc, char **argv)
             }

             /* If the cluster is allocated, we don't need to take action */
-            ret = bdrv_is_allocated(bs, sector, n, &n);
+            ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS,
+                                    n << BDRV_SECTOR_BITS, &count);
             if (ret < 0) {
                 error_report("error while reading image metadata: %s",
                              strerror(-ret));
                 goto out;
             }
+            n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
             if (ret) {
                 continue;
             }
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 0ac7457..62278ea 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1760,12 +1760,12 @@ out:
 static int alloc_f(BlockBackend *blk, int argc, char **argv)
 {
     BlockDriverState *bs = blk_bs(blk);
-    int64_t offset, sector_num, nb_sectors, remaining, bytes;
+    int64_t offset, start, remaining, bytes;
     char s1[64];
-    int num, ret;
-    int64_t sum_alloc;
+    int ret;
+    int64_t num, sum_alloc;

-    offset = cvtnum(argv[1]);
+    start = offset = cvtnum(argv[1]);
     if (offset < 0) {
         print_cvtnum_err(offset, argv[1]);
         return 0;
@@ -1793,32 +1793,30 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
                bytes);
         return 0;
     }
-    nb_sectors = bytes >> BDRV_SECTOR_BITS;

-    remaining = nb_sectors;
+    remaining = bytes;
     sum_alloc = 0;
-    sector_num = offset >> 9;
     while (remaining) {
-        ret = bdrv_is_allocated(bs, sector_num, remaining, &num);
+        ret = bdrv_is_allocated(bs, offset, remaining, &num);
         if (ret < 0) {
             printf("is_allocated failed: %s\n", strerror(-ret));
             return 0;
         }
-        sector_num += num;
+        offset += num;
         remaining -= num;
         if (ret) {
             sum_alloc += num;
         }
         if (num == 0) {
-            nb_sectors -= remaining;
+            bytes -= remaining;
             remaining = 0;
         }
     }

-    cvtstr(offset, s1, sizeof(s1));
+    cvtstr(start, s1, sizeof(s1));

     printf("%"PRId64"/%"PRId64" bytes allocated at offset %s\n",
-           sum_alloc << BDRV_SECTOR_BITS, nb_sectors << BDRV_SECTOR_BITS, s1);
+           sum_alloc, bytes, s1);
     return 0;
 }

@@ -1833,14 +1831,15 @@ static const cmdinfo_t alloc_cmd = {
 };


-static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
-                            int64_t nb_sectors, int64_t *pnum)
+static int map_is_allocated(BlockDriverState *bs, int64_t offset,
+                            int64_t bytes, int64_t *pnum)
 {
-    int num, num_checked;
+    int64_t num;
+    int num_checked;
     int ret, firstret;

-    num_checked = MIN(nb_sectors, INT_MAX);
-    ret = bdrv_is_allocated(bs, sector_num, num_checked, &num);
+    num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
+    ret = bdrv_is_allocated(bs, offset, num_checked, &num);
     if (ret < 0) {
         return ret;
     }
@@ -1848,12 +1847,12 @@ static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
     firstret = ret;
     *pnum = num;

-    while (nb_sectors > 0 && ret == firstret) {
-        sector_num += num;
-        nb_sectors -= num;
+    while (bytes > 0 && ret == firstret) {
+        offset += num;
+        bytes -= num;

-        num_checked = MIN(nb_sectors, INT_MAX);
-        ret = bdrv_is_allocated(bs, sector_num, num_checked, &num);
+        num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
+        ret = bdrv_is_allocated(bs, offset, num_checked, &num);
         if (ret == firstret && num) {
             *pnum += num;
         } else {
@@ -1867,7 +1866,7 @@ static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
 static int map_f(BlockBackend *blk, int argc, char **argv)
 {
     int64_t offset;
-    int64_t nb_sectors, total_sectors;
+    int64_t bytes, total_sectors;
     char s1[64];
     int64_t num;
     int ret;
@@ -1881,10 +1880,10 @@ static int map_f(BlockBackend *blk, int argc, char **argv)
         return 0;
     }

-    nb_sectors = total_sectors;
+    bytes = total_sectors * BDRV_SECTOR_SIZE;

     do {
-        ret = map_is_allocated(blk_bs(blk), offset, nb_sectors, &num);
+        ret = map_is_allocated(blk_bs(blk), offset, bytes, &num);
         if (ret < 0) {
             error_report("Failed to get allocation status: %s", strerror(-ret));
             return 0;
@@ -1894,13 +1893,13 @@ static int map_f(BlockBackend *blk, int argc, char **argv)
         }

         retstr = ret ? "    allocated" : "not allocated";
-        cvtstr(offset << 9ULL, s1, sizeof(s1));
+        cvtstr(offset, s1, sizeof(s1));
         printf("[% 24" PRId64 "] % 16" PRId64 " bytes %s at offset %s (%d)\n",
-               offset << 9ULL, num << 9ULL, retstr, s1, ret);
+               offset, num, retstr, s1, ret);

         offset += num;
-        nb_sectors -= num;
-    } while (offset < total_sectors);
+        bytes -= num;
+    } while (offset < total_sectors * BDRV_SECTOR_SIZE);

     return 0;
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (15 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based Eric Blake
@ 2017-04-11 22:29 ` Eric Blake
  2017-04-24 23:06   ` [Qemu-devel] [Qemu-block] " John Snow
  2017-04-17 23:42 ` [Qemu-devel] [Qemu-block] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based John Snow
  17 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2017-04-11 22:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Jeff Cody, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Wen Congyang, Changlong Xie

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, for the most part this patch is just the
addition of scaling at the callers followed by inverse scaling at
bdrv_is_allocated().  But some code, particularly stream_run(),
gets a lot simpler because it no longer has to mess with sectors.

For ease of review, bdrv_is_allocated() was tackled separately.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/block.h |  2 +-
 block/commit.c        | 20 ++++++++------------
 block/io.c            | 36 +++++++++++++++---------------------
 block/mirror.c        |  5 ++++-
 block/replication.c   | 17 ++++++++++++-----
 block/stream.c        | 21 +++++++++------------
 qemu-img.c            | 10 +++++++---
 7 files changed, 56 insertions(+), 55 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 8641149..740cb86 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -425,7 +425,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
                       int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
-                            int64_t sector_num, int nb_sectors, int *pnum);
+                            int64_t offset, int64_t bytes, int64_t *pnum);

 bool bdrv_is_read_only(BlockDriverState *bs);
 bool bdrv_is_sg(BlockDriverState *bs);
diff --git a/block/commit.c b/block/commit.c
index 4d6bb2a..989de7d 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -132,7 +132,7 @@ static void coroutine_fn commit_run(void *opaque)
     int64_t offset;
     uint64_t delay_ns = 0;
     int ret = 0;
-    int n = 0; /* sectors */
+    int64_t n = 0; /* bytes */
     void *buf = NULL;
     int bytes_written = 0;
     int64_t base_len;
@@ -157,7 +157,7 @@ static void coroutine_fn commit_run(void *opaque)

     buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);

-    for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
+    for (offset = 0; offset < s->common.len; offset += n) {
         bool copy;

         /* Note that even when no rate limit is applied we need to yield
@@ -169,15 +169,12 @@ static void coroutine_fn commit_run(void *opaque)
         }
         /* Copy if allocated above the base */
         ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
-                                      offset / BDRV_SECTOR_SIZE,
-                                      COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
-                                      &n);
+                                      offset, COMMIT_BUFFER_SIZE, &n);
         copy = (ret == 1);
-        trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
+        trace_commit_one_iteration(s, offset, n, ret);
         if (copy) {
-            ret = commit_populate(s->top, s->base, offset,
-                                  n * BDRV_SECTOR_SIZE, buf);
-            bytes_written += n * BDRV_SECTOR_SIZE;
+            ret = commit_populate(s->top, s->base, offset, n, buf);
+            bytes_written += n;
         }
         if (ret < 0) {
             BlockErrorAction action =
@@ -190,11 +187,10 @@ static void coroutine_fn commit_run(void *opaque)
             }
         }
         /* Publish progress */
-        s->common.offset += n * BDRV_SECTOR_SIZE;
+        s->common.offset += n;

         if (copy && s->common.speed) {
-            delay_ns = ratelimit_calculate_delay(&s->limit,
-                                                 n * BDRV_SECTOR_SIZE);
+            delay_ns = ratelimit_calculate_delay(&s->limit, n);
         }
     }

diff --git a/block/io.c b/block/io.c
index 438a493..9218329 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1930,52 +1930,46 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
 /*
  * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
  *
- * Return true if the given sector is allocated in any image between
+ * Return true if the given offset is allocated in any image between
  * BASE and TOP (inclusive).  BASE can be NULL to check if the given
- * sector is allocated in any image of the chain.  Return false otherwise,
+ * offset is allocated in any image of the chain.  Return false otherwise,
  * or negative errno on failure.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- *  the specified sector) that are known to be in the same
+ * 'pnum' is set to the number of bytes (including and immediately following
+ *  the specified offset) that are known to be in the same
  *  allocated/unallocated state.
  *
  */
 int bdrv_is_allocated_above(BlockDriverState *top,
                             BlockDriverState *base,
-                            int64_t sector_num,
-                            int nb_sectors, int *pnum)
+                            int64_t offset, int64_t bytes, int64_t *pnum)
 {
     BlockDriverState *intermediate;
-    int ret, n = nb_sectors;
+    int ret;
+    int64_t n = bytes;

     intermediate = top;
     while (intermediate && intermediate != base) {
         int64_t pnum_inter;
-        int psectors_inter;

-        ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
-                                nb_sectors * BDRV_SECTOR_SIZE,
-                                &pnum_inter);
+        ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
         if (ret < 0) {
             return ret;
         }
-        assert(pnum_inter < INT_MAX * BDRV_SECTOR_SIZE);
-        psectors_inter = pnum_inter >> BDRV_SECTOR_BITS;
         if (ret) {
-            *pnum = psectors_inter;
+            *pnum = pnum_inter;
             return 1;
         }

         /*
-         * [sector_num, nb_sectors] is unallocated on top but intermediate
-         * might have
-         *
-         * [sector_num+x, nr_sectors] allocated.
+         * [offset, bytes] is unallocated on top but intermediate
+         * might have [offset+x, bytes-x] allocated.
          */
-        if (n > psectors_inter &&
+        if (n > pnum_inter &&
             (intermediate == top ||
-             sector_num + psectors_inter < intermediate->total_sectors)) {
-            n = psectors_inter;
+             offset + pnum_inter < (intermediate->total_sectors *
+                                    BDRV_SECTOR_SIZE))) {
+            n = pnum_inter;
         }

         intermediate = backing_bs(intermediate);
diff --git a/block/mirror.c b/block/mirror.c
index 8de0492..c92335a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -609,6 +609,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
     BlockDriverState *bs = s->source;
     BlockDriverState *target_bs = blk_bs(s->target);
     int ret, n;
+    int64_t count;

     end = s->bdev_length / BDRV_SECTOR_SIZE;

@@ -658,11 +659,13 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
             return 0;
         }

-        ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, &n);
+        ret = bdrv_is_allocated_above(bs, base, sector_num * BDRV_SECTOR_SIZE,
+                                      nb_sectors * BDRV_SECTOR_SIZE, &count);
         if (ret < 0) {
             return ret;
         }

+        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
         assert(n > 0);
         if (ret == 1) {
             bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
diff --git a/block/replication.c b/block/replication.c
index 414ecc4..605d90f 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -264,7 +264,8 @@ static coroutine_fn int replication_co_writev(BlockDriverState *bs,
     BdrvChild *top = bs->file;
     BdrvChild *base = s->secondary_disk;
     BdrvChild *target;
-    int ret, n;
+    int ret;
+    int64_t n;

     ret = replication_get_io_status(s);
     if (ret < 0) {
@@ -283,14 +284,20 @@ static coroutine_fn int replication_co_writev(BlockDriverState *bs,
      */
     qemu_iovec_init(&hd_qiov, qiov->niov);
     while (remaining_sectors > 0) {
-        ret = bdrv_is_allocated_above(top->bs, base->bs, sector_num,
-                                      remaining_sectors, &n);
+        int64_t count;
+
+        ret = bdrv_is_allocated_above(top->bs, base->bs,
+                                      sector_num * BDRV_SECTOR_SIZE,
+                                      remaining_sectors * BDRV_SECTOR_SIZE,
+                                      &count);
         if (ret < 0) {
             goto out1;
         }

+        assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
+        n = count >> BDRV_SECTOR_BITS;
         qemu_iovec_reset(&hd_qiov);
-        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * BDRV_SECTOR_SIZE);
+        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, count);

         target = ret ? top : base;
         ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
@@ -300,7 +307,7 @@ static coroutine_fn int replication_co_writev(BlockDriverState *bs,

         remaining_sectors -= n;
         sector_num += n;
-        bytes_done += n * BDRV_SECTOR_SIZE;
+        bytes_done += count;
     }

 out1:
diff --git a/block/stream.c b/block/stream.c
index 85502eb..9033655 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -112,7 +112,7 @@ static void coroutine_fn stream_run(void *opaque)
     uint64_t delay_ns = 0;
     int error = 0;
     int ret = 0;
-    int n = 0; /* sectors */
+    int64_t n = 0; /* bytes */
     void *buf;

     if (!bs->backing) {
@@ -136,9 +136,8 @@ static void coroutine_fn stream_run(void *opaque)
         bdrv_enable_copy_on_read(bs);
     }

-    for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
+    for ( ; offset < s->common.len; offset += n) {
         bool copy;
-        int64_t count = 0;

         /* Note that even when no rate limit is applied we need to yield
          * with no pending I/O here so that bdrv_drain_all() returns.
@@ -150,26 +149,25 @@ static void coroutine_fn stream_run(void *opaque)

         copy = false;

-        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count);
-        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
+        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &n);
         if (ret == 1) {
             /* Allocated in the top, no need to copy.  */
         } else if (ret >= 0) {
             /* Copy if allocated in the intermediate images.  Limit to the
              * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
             ret = bdrv_is_allocated_above(backing_bs(bs), base,
-                                          offset / BDRV_SECTOR_SIZE, n, &n);
+                                          offset, n, &n);

             /* Finish early if end of backing file has been reached */
             if (ret == 0 && n == 0) {
-                n = (s->common.len - offset) / BDRV_SECTOR_SIZE;
+                n = s->common.len - offset;
             }

             copy = (ret == 1);
         }
-        trace_stream_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
+        trace_stream_one_iteration(s, offset, n, ret);
         if (copy) {
-            ret = stream_populate(blk, offset, n * BDRV_SECTOR_SIZE, buf);
+            ret = stream_populate(blk, offset, n, buf);
         }
         if (ret < 0) {
             BlockErrorAction action =
@@ -188,10 +186,9 @@ static void coroutine_fn stream_run(void *opaque)
         ret = 0;

         /* Publish progress */
-        s->common.offset += n * BDRV_SECTOR_SIZE;
+        s->common.offset += n;
         if (copy && s->common.speed) {
-            delay_ns = ratelimit_calculate_delay(&s->limit,
-                                                 n * BDRV_SECTOR_SIZE);
+            delay_ns = ratelimit_calculate_delay(&s->limit, n);
         }
     }

diff --git a/qemu-img.c b/qemu-img.c
index 2f21d69..d96b4d6 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1448,12 +1448,16 @@ static int img_compare(int argc, char **argv)
         }

         for (;;) {
+            int64_t count;
+
             nb_sectors = sectors_to_process(total_sectors_over, sector_num);
             if (nb_sectors <= 0) {
                 break;
             }
-            ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL, sector_num,
-                                          nb_sectors, &pnum);
+            ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL,
+                                          sector_num * BDRV_SECTOR_SIZE,
+                                          nb_sectors * BDRV_SECTOR_SIZE,
+                                          &count);
             if (ret < 0) {
                 ret = 3;
                 error_report("Sector allocation test failed for %s",
@@ -1461,7 +1465,7 @@ static int img_compare(int argc, char **argv)
                 goto out;

             }
-            nb_sectors = pnum;
+            nb_sectors = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
             if (ret) {
                 ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
                                           filename_over, buf1, quiet);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 01/17] blockjob: Track job ratelimits via bytes, not sectors
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 01/17] blockjob: Track job ratelimits via bytes, not sectors Eric Blake
@ 2017-04-17 19:18   ` John Snow
  2017-04-17 19:51     ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: John Snow @ 2017-04-17 19:18 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block, Max Reitz



On 04/11/2017 06:29 PM, Eric Blake wrote:
> The user interface specifies job rate limits in bytes/second.
> It's pointless to have our internal representation track things
> in sectors/second, particularly since we want to move away from
> sector-based interfaces.
> 
> Fix up a doc typo found while verifying that the ratelimit
> code handles the scaling difference.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  include/qemu/ratelimit.h |  3 ++-
>  block/backup.c           |  5 +++--
>  block/commit.c           |  5 +++--
>  block/mirror.c           | 13 +++++++------
>  block/stream.c           |  5 +++--
>  5 files changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
> index 8da1232..8dece48 100644
> --- a/include/qemu/ratelimit.h
> +++ b/include/qemu/ratelimit.h
> @@ -24,7 +24,8 @@ typedef struct {
> 
>  /** Calculate and return delay for next request in ns
>   *
> - * Record that we sent @p n data units. If we may send more data units
> + * Record that we sent @n data units (where @n matches the scale chosen
> + * during ratelimit_set_speed). If we may send more data units
>   * in the current time slice, return 0 (i.e. no delay). Otherwise
>   * return the amount of time (in ns) until the start of the next time
>   * slice that will permit sending the next chunk of data.
> diff --git a/block/backup.c b/block/backup.c
> index a4fb288..bcfa321 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -208,7 +208,7 @@ static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
>          error_setg(errp, QERR_INVALID_PARAMETER, "speed");
>          return;
>      }
> -    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
> +    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
>  }
> 
>  static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
> @@ -359,7 +359,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
>       */
>      if (job->common.speed) {
>          uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
> -                                                      job->sectors_read);
> +                                                      job->sectors_read *
> +                                                      BDRV_SECTOR_SIZE);
>          job->sectors_read = 0;
>          block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
>      } else {
> diff --git a/block/commit.c b/block/commit.c
> index 76a0d98..8b684e0 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -195,7 +195,8 @@ static void coroutine_fn commit_run(void *opaque)
>          s->common.offset += n * BDRV_SECTOR_SIZE;
> 
>          if (copy && s->common.speed) {
> -            delay_ns = ratelimit_calculate_delay(&s->limit, n);
> +            delay_ns = ratelimit_calculate_delay(&s->limit,
> +                                                 n * BDRV_SECTOR_SIZE);

You could probably factor out this calculation in conjunction with the
offset update above, but no matter.

>          }
>      }
> 
> @@ -217,7 +218,7 @@ static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp)
>          error_setg(errp, QERR_INVALID_PARAMETER, "speed");
>          return;
>      }
> -    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
> +    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
>  }
> 
>  static const BlockJobDriver commit_job_driver = {
> diff --git a/block/mirror.c b/block/mirror.c
> index 2173a2f..a7d0960 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -391,7 +391,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>      bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
>      while (nb_chunks > 0 && sector_num < end) {
>          int64_t ret;
> -        int io_sectors, io_sectors_acct;
> +        int io_sectors;
> +        int64_t io_bytes_acct;
>          BlockDriverState *file;
>          enum MirrorMethod {
>              MIRROR_METHOD_COPY,
> @@ -439,16 +440,16 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>          switch (mirror_method) {
>          case MIRROR_METHOD_COPY:
>              io_sectors = mirror_do_read(s, sector_num, io_sectors);
> -            io_sectors_acct = io_sectors;
> +            io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
>              break;
>          case MIRROR_METHOD_ZERO:
>          case MIRROR_METHOD_DISCARD:
>              mirror_do_zero_or_discard(s, sector_num, io_sectors,
>                                        mirror_method == MIRROR_METHOD_DISCARD);
>              if (write_zeroes_ok) {
> -                io_sectors_acct = 0;
> +                io_bytes_acct = 0;
>              } else {
> -                io_sectors_acct = io_sectors;
> +                io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
>              }
>              break;
>          default:
> @@ -458,7 +459,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>          sector_num += io_sectors;
>          nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk);
>          if (s->common.speed) {
> -            delay_ns = ratelimit_calculate_delay(&s->limit, io_sectors_acct);
> +            delay_ns = ratelimit_calculate_delay(&s->limit, io_bytes_acct);
>          }
>      }
>      return delay_ns;
> @@ -918,7 +919,7 @@ static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
>          error_setg(errp, QERR_INVALID_PARAMETER, "speed");
>          return;
>      }
> -    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
> +    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
>  }
> 
>  static void mirror_complete(BlockJob *job, Error **errp)
> diff --git a/block/stream.c b/block/stream.c
> index 0113710..11c5cf1 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -191,7 +191,8 @@ static void coroutine_fn stream_run(void *opaque)
>          /* Publish progress */
>          s->common.offset += n * BDRV_SECTOR_SIZE;
>          if (copy && s->common.speed) {
> -            delay_ns = ratelimit_calculate_delay(&s->limit, n);
> +            delay_ns = ratelimit_calculate_delay(&s->limit,
> +                                                 n * BDRV_SECTOR_SIZE);

Same kind of comment here.

>          }
>      }
> 
> @@ -220,7 +221,7 @@ static void stream_set_speed(BlockJob *job, int64_t speed, Error **errp)
>          error_setg(errp, QERR_INVALID_PARAMETER, "speed");
>          return;
>      }
> -    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
> +    ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
>  }
> 
>  static const BlockJobDriver stream_job_driver = {
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 02/17] trace: Show blockjob actions via bytes, not sectors
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 02/17] trace: Show blockjob actions " Eric Blake
@ 2017-04-17 19:18   ` John Snow
  2017-04-17 19:55     ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: John Snow @ 2017-04-17 19:18 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block, Max Reitz



On 04/11/2017 06:29 PM, Eric Blake wrote:
> Upcoming patches are going to switch to byte-based interfaces
> instead of sector-based.  Even worse, trace_backup_do_cow_enter()
> had a weird mix of cluster and sector indices.  Make the tracing
> uniformly use bytes.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  block/backup.c     | 16 ++++++++++------
>  block/commit.c     |  3 ++-
>  block/mirror.c     | 26 +++++++++++++++++---------
>  block/stream.c     |  3 ++-
>  block/trace-events | 14 +++++++-------
>  5 files changed, 38 insertions(+), 24 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index bcfa321..b28df8c 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -102,6 +102,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>      void *bounce_buffer = NULL;
>      int ret = 0;
>      int64_t sectors_per_cluster = cluster_size_sectors(job);
> +    int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
>      int64_t start, end;
>      int n;
> 
> @@ -110,18 +111,20 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>      start = sector_num / sectors_per_cluster;
>      end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
> 
> -    trace_backup_do_cow_enter(job, start, sector_num, nb_sectors);
> +    trace_backup_do_cow_enter(job, start * bytes_per_cluster,
> +                              sector_num * BDRV_SECTOR_SIZE,
> +                              nb_sectors * BDRV_SECTOR_SIZE);
> 
>      wait_for_overlapping_requests(job, start, end);
>      cow_request_begin(&cow_request, job, start, end);
> 
>      for (; start < end; start++) {
>          if (test_bit(start, job->done_bitmap)) {
> -            trace_backup_do_cow_skip(job, start);
> +            trace_backup_do_cow_skip(job, start * bytes_per_cluster);
>              continue; /* already copied */
>          }
> 
> -        trace_backup_do_cow_process(job, start);
> +        trace_backup_do_cow_process(job, start * bytes_per_cluster);
> 
>          n = MIN(sectors_per_cluster,
>                  job->common.len / BDRV_SECTOR_SIZE -
> @@ -138,7 +141,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>                              bounce_qiov.size, &bounce_qiov,
>                              is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
>          if (ret < 0) {
> -            trace_backup_do_cow_read_fail(job, start, ret);
> +            trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, ret);
>              if (error_is_read) {
>                  *error_is_read = true;
>              }
> @@ -154,7 +157,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>                                   job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
>          }
>          if (ret < 0) {
> -            trace_backup_do_cow_write_fail(job, start, ret);
> +            trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, ret);
>              if (error_is_read) {
>                  *error_is_read = false;
>              }
> @@ -177,7 +180,8 @@ out:
> 
>      cow_request_end(&cow_request);
> 
> -    trace_backup_do_cow_return(job, sector_num, nb_sectors, ret);
> +    trace_backup_do_cow_return(job, sector_num * BDRV_SECTOR_SIZE,
> +                               nb_sectors * BDRV_SECTOR_SIZE, ret);
> 
>      qemu_co_rwlock_unlock(&job->flush_rwlock);
> 
> diff --git a/block/commit.c b/block/commit.c
> index 8b684e0..11dab98 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -176,7 +176,8 @@ static void coroutine_fn commit_run(void *opaque)
>                                        COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
>                                        &n);
>          copy = (ret == 1);
> -        trace_commit_one_iteration(s, sector_num, n, ret);
> +        trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
> +                                   n * BDRV_SECTOR_SIZE, ret);
>          if (copy) {
>              ret = commit_populate(s->top, s->base, sector_num, n, buf);
>              bytes_written += n * BDRV_SECTOR_SIZE;
> diff --git a/block/mirror.c b/block/mirror.c
> index a7d0960..d83d482 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -103,7 +103,8 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>      int64_t chunk_num;
>      int i, nb_chunks, sectors_per_chunk;
> 
> -    trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors, ret);
> +    trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
> +                                op->nb_sectors * BDRV_SECTOR_SIZE, ret);
> 
>      s->in_flight--;
>      s->sectors_in_flight -= op->nb_sectors;
> @@ -268,7 +269,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
>      nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
> 
>      while (s->buf_free_count < nb_chunks) {
> -        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
> +        trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
> +                                     s->in_flight);
>          mirror_wait_for_io(s);
>      }
> 
> @@ -294,7 +296,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
>      /* Copy the dirty cluster.  */
>      s->in_flight++;
>      s->sectors_in_flight += nb_sectors;
> -    trace_mirror_one_iteration(s, sector_num, nb_sectors);
> +    trace_mirror_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
> +                               nb_sectors * BDRV_SECTOR_SIZE);
> 
>      blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, &op->qiov, 0,
>                     mirror_read_complete, op);
> @@ -346,13 +349,15 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>      if (sector_num < 0) {
>          bdrv_set_dirty_iter(s->dbi, 0);
>          sector_num = bdrv_dirty_iter_next(s->dbi);
> -        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
> +        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
> +                                  BDRV_SECTOR_SIZE);

mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64

I guess the print doesn't really bother to say what the unit is, just a
"dirty count."

Does this conflict with the bitmap series, though? (Won't need to scale
by BDRV_SECTOR_SIZE after that, I think.)

>          assert(sector_num >= 0);
>      }
> 
>      first_chunk = sector_num / sectors_per_chunk;
>      while (test_bit(first_chunk, s->in_flight_bitmap)) {
> -        trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
> +        trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
> +                                     s->in_flight);
>          mirror_wait_for_io(s);
>      }
> 
> @@ -428,7 +433,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>          }
> 
>          while (s->in_flight >= MAX_IN_FLIGHT) {
> -            trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
> +            trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
> +                                         s->in_flight);
>              mirror_wait_for_io(s);
>          }
> 
> @@ -811,7 +817,8 @@ static void coroutine_fn mirror_run(void *opaque)
>              s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
>              if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
>                  (cnt == 0 && s->in_flight > 0)) {
> -                trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
> +                trace_mirror_yield(s, cnt * BDRV_SECTOR_SIZE,
> +                                   s->buf_free_count, s->in_flight);

I guess this is another case where the unit wasn't really specified, so
we can just change it.

>                  mirror_wait_for_io(s);
>                  continue;
>              } else if (cnt != 0) {
> @@ -852,7 +859,7 @@ static void coroutine_fn mirror_run(void *opaque)
>               * whether to switch to target check one last time if I/O has
>               * come in the meanwhile, and if not flush the data to disk.
>               */
> -            trace_mirror_before_drain(s, cnt);
> +            trace_mirror_before_drain(s, cnt * BDRV_SECTOR_SIZE);
> 
>              bdrv_drained_begin(bs);
>              cnt = bdrv_get_dirty_count(s->dirty_bitmap);
> @@ -871,7 +878,8 @@ static void coroutine_fn mirror_run(void *opaque)
>          }
> 
>          ret = 0;
> -        trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
> +        trace_mirror_before_sleep(s, cnt * BDRV_SECTOR_SIZE,
> +                                  s->synced, delay_ns);
>          if (!s->synced) {
>              block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
>              if (block_job_is_cancelled(&s->common)) {
> diff --git a/block/stream.c b/block/stream.c
> index 11c5cf1..3134ec5 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -168,7 +168,8 @@ static void coroutine_fn stream_run(void *opaque)
> 
>              copy = (ret == 1);
>          }
> -        trace_stream_one_iteration(s, sector_num, n, ret);
> +        trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
> +                                   n * BDRV_SECTOR_SIZE, ret);
>          if (copy) {
>              ret = stream_populate(blk, sector_num, n, buf);
>          }
> diff --git a/block/trace-events b/block/trace-events
> index 0bc5c0a..04f6463 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -18,11 +18,11 @@ bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p off
>  bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %u"
> 
>  # block/stream.c
> -stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
> +stream_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
>  stream_start(void *bs, void *base, void *s) "bs %p base %p s %p"
> 
>  # block/commit.c
> -commit_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
> +commit_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
>  commit_start(void *bs, void *base, void *top, void *s) "bs %p base %p top %p s %p"
> 
>  # block/mirror.c
> @@ -31,14 +31,14 @@ mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64
>  mirror_before_flush(void *s) "s %p"
>  mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
>  mirror_before_sleep(void *s, int64_t cnt, int synced, uint64_t delay_ns) "s %p dirty count %"PRId64" synced %d delay %"PRIu64"ns"
> -mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
> -mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors, int ret) "s %p sector_num %"PRId64" nb_sectors %d ret %d"
> +mirror_one_iteration(void *s, int64_t offset, uint64_t bytes) "s %p offset %" PRId64 " bytes %" PRIu64
> +mirror_iteration_done(void *s, int64_t offset, uint64_t bytes, int ret) "s %p offset %" PRId64 " bytes %" PRIu64 " ret %d"
>  mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d"
> -mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
> +mirror_yield_in_flight(void *s, int64_t offset, int in_flight) "s %p offset %" PRId64 " in_flight %d"
> 
>  # block/backup.c
> -backup_do_cow_enter(void *job, int64_t start, int64_t sector_num, int nb_sectors) "job %p start %"PRId64" sector_num %"PRId64" nb_sectors %d"
> -backup_do_cow_return(void *job, int64_t sector_num, int nb_sectors, int ret) "job %p sector_num %"PRId64" nb_sectors %d ret %d"
> +backup_do_cow_enter(void *job, int64_t start, int64_t offset, uint64_t bytes) "job %p start %" PRId64 " offset %" PRId64 " bytes %" PRIu64
> +backup_do_cow_return(void *job, int64_t offset, uint64_t bytes, int ret) "job %p offset %" PRId64 " bytes %" PRIu64 " ret %d"

>  backup_do_cow_skip(void *job, int64_t start) "job %p start %"PRId64
>  backup_do_cow_process(void *job, int64_t start) "job %p start %"PRId64
>  backup_do_cow_read_fail(void *job, int64_t start, int ret) "job %p start %"PRId64" ret %d"

I guess these three were "cluster" based, but you didn't need to change
anything to assert them as byte-based.

> 

Under the assumption that it's okay to change tracing output without
being able to differentiate between new and old output:

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 01/17] blockjob: Track job ratelimits via bytes, not sectors
  2017-04-17 19:18   ` [Qemu-devel] [Qemu-block] " John Snow
@ 2017-04-17 19:51     ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-17 19:51 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1780 bytes --]

On 04/17/2017 02:18 PM, John Snow wrote:
> 
> 
> On 04/11/2017 06:29 PM, Eric Blake wrote:
>> The user interface specifies job rate limits in bytes/second.
>> It's pointless to have our internal representation track things
>> in sectors/second, particularly since we want to move away from
>> sector-based interfaces.
>>
>> Fix up a doc typo found while verifying that the ratelimit
>> code handles the scaling difference.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---

>> +++ b/block/commit.c
>> @@ -195,7 +195,8 @@ static void coroutine_fn commit_run(void *opaque)
>>          s->common.offset += n * BDRV_SECTOR_SIZE;
>>
>>          if (copy && s->common.speed) {
>> -            delay_ns = ratelimit_calculate_delay(&s->limit, n);
>> +            delay_ns = ratelimit_calculate_delay(&s->limit,
>> +                                                 n * BDRV_SECTOR_SIZE);
> 
> You could probably factor out this calculation in conjunction with the
> offset update above, but no matter.

It gets simplified in a later patch, when I switch the entire function
to track by bytes instead of sectors.


>> +++ b/block/stream.c
>> @@ -191,7 +191,8 @@ static void coroutine_fn stream_run(void *opaque)
>>          /* Publish progress */
>>          s->common.offset += n * BDRV_SECTOR_SIZE;
>>          if (copy && s->common.speed) {
>> -            delay_ns = ratelimit_calculate_delay(&s->limit, n);
>> +            delay_ns = ratelimit_calculate_delay(&s->limit,
>> +                                                 n * BDRV_SECTOR_SIZE);
> 
> Same kind of comment here.

And same response :)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 02/17] trace: Show blockjob actions via bytes, not sectors
  2017-04-17 19:18   ` [Qemu-devel] [Qemu-block] " John Snow
@ 2017-04-17 19:55     ` Eric Blake
  2017-04-19  9:12       ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2017-04-17 19:55 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, qemu-block, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2010 bytes --]

On 04/17/2017 02:18 PM, John Snow wrote:

>> @@ -346,13 +349,15 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>>      if (sector_num < 0) {
>>          bdrv_set_dirty_iter(s->dbi, 0);
>>          sector_num = bdrv_dirty_iter_next(s->dbi);
>> -        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
>> +        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
>> +                                  BDRV_SECTOR_SIZE);
> 
> mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64
> 
> I guess the print doesn't really bother to say what the unit is, just a
> "dirty count."
> 
> Does this conflict with the bitmap series, though? (Won't need to scale
> by BDRV_SECTOR_SIZE after that, I think.)
> 

That series depends on this one going in first (as currently written),
and indeed, in that series, the code DOES simplify to drop the '*
BDRV_SECTOR_SIZE' in the patch that changes the return scale of
bdrv_get_dirty_count().


>>  backup_do_cow_skip(void *job, int64_t start) "job %p start %"PRId64
>>  backup_do_cow_process(void *job, int64_t start) "job %p start %"PRId64
>>  backup_do_cow_read_fail(void *job, int64_t start, int ret) "job %p start %"PRId64" ret %d"
> 
> I guess these three were "cluster" based, but you didn't need to change
> anything to assert them as byte-based.
> 
>>
> 
> Under the assumption that it's okay to change tracing output without
> being able to differentiate between new and old output:

I'll leave that to Stefan's discretion as trace maintainer, but my
thoughts are that tracing is for debugging purposes, and as long as you
know what binary you are debugging, you can figure out what the trace
message meant by referring back to the source that generated that binary.

> 
> Reviewed-by: John Snow <jsnow@redhat.com>
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 13/17] backup: Switch block_backup.h to byte-based
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 13/17] backup: Switch block_backup.h " Eric Blake
@ 2017-04-17 23:24   ` John Snow
  2017-04-18  0:54     ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: John Snow @ 2017-04-17 23:24 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, Changlong Xie, qemu-block, Wen Congyang, Max Reitz



On 04/11/2017 06:29 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting
> the public interface to backup jobs (no semantic change), including
> a change to CowRequest to track by bytes instead of cluster indices.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  include/block/block_backup.h | 11 +++++------
>  block/backup.c               | 29 ++++++++++++++---------------
>  block/replication.c          | 12 ++++++++----
>  3 files changed, 27 insertions(+), 25 deletions(-)
> 
> diff --git a/include/block/block_backup.h b/include/block/block_backup.h
> index 8a75947..994a3bd 100644
> --- a/include/block/block_backup.h
> +++ b/include/block/block_backup.h
> @@ -21,17 +21,16 @@
>  #include "block/block_int.h"
> 
>  typedef struct CowRequest {
> -    int64_t start;
> -    int64_t end;
> +    int64_t start_byte;
> +    int64_t end_byte;
>      QLIST_ENTRY(CowRequest) list;
>      CoQueue wait_queue; /* coroutines blocked on this request */
>  } CowRequest;
> 
> -void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
> -                                          int nb_sectors);
> +void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
> +                                          uint64_t bytes);
>  void backup_cow_request_begin(CowRequest *req, BlockJob *job,
> -                              int64_t sector_num,
> -                              int nb_sectors);
> +                              int64_t offset, uint64_t bytes);
>  void backup_cow_request_end(CowRequest *req);

Should we adjust the parameter names of cow_request_begin and
wait_for_overlapping_requests, too?

> 
>  void backup_do_checkpoint(BlockJob *job, Error **errp);
> diff --git a/block/backup.c b/block/backup.c
> index a64a162..0502c1a 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -64,7 +64,7 @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
>      do {
>          retry = false;
>          QLIST_FOREACH(req, &job->inflight_reqs, list) {
> -            if (end > req->start && start < req->end) {
> +            if (end > req->start_byte && start < req->end_byte) {
>                  qemu_co_queue_wait(&req->wait_queue, NULL);
>                  retry = true;
>                  break;
> @@ -77,8 +77,8 @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
>  static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
>                                       int64_t start, int64_t end)
>  {
> -    req->start = start;
> -    req->end = end;
> +    req->start_byte = start;
> +    req->end_byte = end;
>      qemu_co_queue_init(&req->wait_queue);
>      QLIST_INSERT_HEAD(&job->inflight_reqs, req, list);
>  }
> @@ -114,8 +114,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>                                sector_num * BDRV_SECTOR_SIZE,
>                                nb_sectors * BDRV_SECTOR_SIZE);
> 
> -    wait_for_overlapping_requests(job, start, end);
> -    cow_request_begin(&cow_request, job, start, end);
> +    wait_for_overlapping_requests(job, start * job->cluster_size,
> +                                  end * job->cluster_size);
> +    cow_request_begin(&cow_request, job, start * job->cluster_size,
> +                      end * job->cluster_size);
> 
>      for (; start < end; start++) {
>          if (test_bit(start, job->done_bitmap)) {
> @@ -277,32 +279,29 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
>      bitmap_zero(backup_job->done_bitmap, len);
>  }
> 
> -void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
> -                                          int nb_sectors)
> +void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
> +                                          uint64_t bytes)
>  {
>      BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> -    int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
>      int64_t start, end;
> 
>      assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
> 
> -    start = sector_num / sectors_per_cluster;
> -    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
> +    start = QEMU_ALIGN_DOWN(offset, backup_job->cluster_size);
> +    end = QEMU_ALIGN_UP(offset + bytes, backup_job->cluster_size);
>      wait_for_overlapping_requests(backup_job, start, end);
>  }
> 
>  void backup_cow_request_begin(CowRequest *req, BlockJob *job,
> -                              int64_t sector_num,
> -                              int nb_sectors)
> +                              int64_t offset, uint64_t bytes)
>  {
>      BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> -    int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
>      int64_t start, end;
> 
>      assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
> 
> -    start = sector_num / sectors_per_cluster;
> -    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
> +    start = QEMU_ALIGN_DOWN(offset, backup_job->cluster_size);
> +    end = QEMU_ALIGN_UP(offset + bytes, backup_job->cluster_size);
>      cow_request_begin(req, backup_job, start, end);
>  }
> 
> diff --git a/block/replication.c b/block/replication.c
> index bf3c395..414ecc4 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -234,10 +234,14 @@ static coroutine_fn int replication_co_readv(BlockDriverState *bs,
>      }
> 
>      if (job) {
> -        backup_wait_for_overlapping_requests(child->bs->job, sector_num,
> -                                             remaining_sectors);
> -        backup_cow_request_begin(&req, child->bs->job, sector_num,
> -                                 remaining_sectors);
> +        uint64_t remaining_bytes = remaining_sectors * BDRV_SECTOR_SIZE;
> +
> +        backup_wait_for_overlapping_requests(child->bs->job,
> +                                             sector_num * BDRV_SECTOR_SIZE,
> +                                             remaining_bytes);
> +        backup_cow_request_begin(&req, child->bs->job,
> +                                 sector_num * BDRV_SECTOR_SIZE,
> +                                 remaining_bytes);
>          ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors,
>                              qiov);
>          backup_cow_request_end(&req);
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based
  2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
                   ` (16 preceding siblings ...)
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based Eric Blake
@ 2017-04-17 23:42 ` John Snow
  2017-04-18  1:04   ` Eric Blake
  17 siblings, 1 reply; 38+ messages in thread
From: John Snow @ 2017-04-17 23:42 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block



On 04/11/2017 06:29 PM, Eric Blake wrote:
> There are patches floating around to add NBD_CMD_BLOCK_STATUS,
> but NBD wants to report status on byte granularity (even if the
> reporting will probably be naturally aligned to sectors or even
> much higher levels).  I've therefore started the task of
> converting our block status code to report at a byte granularity
> rather than sectors.
> 
> This is part one of that conversion: bdrv_is_allocated().
> Other parts (still to be written) include tracking dirty bitmaps
> by bytes (it's still one bit per granularity, but now we won't
> be double-scaling from bytes to sectors to granularity), then
> replacing bdrv_get_block_status() with a byte based callback
> in all the drivers.
> 
> Available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-allocated-v1
> 
> It requires v9 or later of my prior work on blkdebug:
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01723.html
> which in turn requires Max's block-next tree:
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01298.html
> 
> Eric Blake (17):
>   blockjob: Track job ratelimits via bytes, not sectors
>   trace: Show blockjob actions via bytes, not sectors
>   stream: Switch stream_populate() to byte-based
>   stream: Switch stream_run() to byte-based
>   commit: Switch commit_populate() to byte-based
>   commit: Switch commit_run() to byte-based
>   mirror: Switch MirrorBlockJob to byte-based
>   mirror: Switch mirror_do_zero_or_discard() to byte-based
>   mirror: Switch mirror_cow_align() to byte-based
>   mirror: Switch mirror_do_read() to byte-based
>   mirror: Switch mirror_iteration() to byte-based
>   backup: Switch BackupBlockJob to byte-based
>   backup: Switch block_backup.h to byte-based
>   backup: Switch backup_do_cow() to byte-based
>   backup: Switch backup_run() to byte-based
>   block: Make bdrv_is_allocated() byte-based
>   block: Make bdrv_is_allocated_above() byte-based
> 
>  include/block/block.h        |   6 +-
>  include/block/block_backup.h |  11 +-
>  include/qemu/ratelimit.h     |   3 +-
>  block/backup.c               | 126 ++++++++----------
>  block/commit.c               |  54 ++++----
>  block/io.c                   |  59 +++++----
>  block/mirror.c               | 300 ++++++++++++++++++++++---------------------
>  block/replication.c          |  29 +++--
>  block/stream.c               |  35 +++--
>  block/vvfat.c                |  34 +++--
>  migration/block.c            |   9 +-
>  qemu-img.c                   |  15 ++-
>  qemu-io-cmds.c               |  57 ++++----
>  block/trace-events           |  14 +-
>  14 files changed, 381 insertions(+), 371 deletions(-)

Shame, you added ten lines!

> 

Patches 1-15:

Reviewed-by: John Snow <jsnow@redhat.com>

9: Is there a good reason for a void fn() to return its argument via a
passed parameter? I see you're matching the other interface, but that
strikes me as wonky.

11: Looks correct to me, but this one's a bit hairier than the rest. How
many times do we truly need to round, adjust, clip, round again, align,
clip, round, align, ...

I'll take a peek at the last two tomorrow.

--js

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 13/17] backup: Switch block_backup.h to byte-based
  2017-04-17 23:24   ` [Qemu-devel] [Qemu-block] " John Snow
@ 2017-04-18  0:54     ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-18  0:54 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Changlong Xie, qemu-block, Wen Congyang, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

On 04/17/2017 06:24 PM, John Snow wrote:
> 
> 
> On 04/11/2017 06:29 PM, Eric Blake wrote:
>> We are gradually converting to byte-based interfaces, as they are
>> easier to reason about than sector-based.  Continue by converting
>> the public interface to backup jobs (no semantic change), including
>> a change to CowRequest to track by bytes instead of cluster indices.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---

>>
>> -void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
>> -                                          int nb_sectors);
>> +void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
>> +                                          uint64_t bytes);
>>  void backup_cow_request_begin(CowRequest *req, BlockJob *job,
>> -                              int64_t sector_num,
>> -                              int nb_sectors);
>> +                              int64_t offset, uint64_t bytes);
>>  void backup_cow_request_end(CowRequest *req);
> 
> Should we adjust the parameter names of cow_request_begin and
> wait_for_overlapping_requests, too?

Sure, I can do that.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based
  2017-04-17 23:42 ` [Qemu-devel] [Qemu-block] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based John Snow
@ 2017-04-18  1:04   ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-18  1:04 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 4152 bytes --]

On 04/17/2017 06:42 PM, John Snow wrote:
> 
> 
> On 04/11/2017 06:29 PM, Eric Blake wrote:
>> There are patches floating around to add NBD_CMD_BLOCK_STATUS,
>> but NBD wants to report status on byte granularity (even if the
>> reporting will probably be naturally aligned to sectors or even
>> much higher levels).  I've therefore started the task of
>> converting our block status code to report at a byte granularity
>> rather than sectors.
>>
>> This is part one of that conversion: bdrv_is_allocated().
>> Other parts (still to be written) include tracking dirty bitmaps
>> by bytes (it's still one bit per granularity, but now we won't
>> be double-scaling from bytes to sectors to granularity),

That series is not only written, but reviewed now:
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02163.html


>> then
>> replacing bdrv_get_block_status() with a byte based callback
>> in all the drivers.

Coming up soon.

>>
>> Available as a tag at:
>> git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-allocated-v1
>>
>> It requires v9 or later of my prior work on blkdebug:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01723.html
>> which in turn requires Max's block-next tree:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01298.html
>>

>>  include/block/block.h        |   6 +-
>>  include/block/block_backup.h |  11 +-
>>  include/qemu/ratelimit.h     |   3 +-
>>  block/backup.c               | 126 ++++++++----------
>>  block/commit.c               |  54 ++++----
>>  block/io.c                   |  59 +++++----
>>  block/mirror.c               | 300 ++++++++++++++++++++++---------------------
>>  block/replication.c          |  29 +++--
>>  block/stream.c               |  35 +++--
>>  block/vvfat.c                |  34 +++--
>>  migration/block.c            |   9 +-
>>  qemu-img.c                   |  15 ++-
>>  qemu-io-cmds.c               |  57 ++++----
>>  block/trace-events           |  14 +-
>>  14 files changed, 381 insertions(+), 371 deletions(-)
> 
> Shame, you added ten lines!

Yeah, some of that back-and-forth scaling is verbose and resulted in
line breaks that used to fit in one.  Of course, the second series
regained those ten lines and more, once I got to flatten some of the
interfaces away from repeated scaling:

$ git diff --stat nbd-byte-allocated-v1..nbd-byte-dirty-v1 | cat
 include/block/block_int.h    |  2 +-
 include/block/dirty-bitmap.h | 21 ++++-------
 block/backup.c               |  7 ++--
 block/dirty-bitmap.c         | 83
++++++++++++--------------------------------
 block/io.c                   |  6 ++--
 block/mirror.c               | 73 +++++++++++++++++---------------------
 migration/block.c            | 14 ++++----
 7 files changed, 74 insertions(+), 132 deletions(-)

> 
>>
> 
> Patches 1-15:
> 
> Reviewed-by: John Snow <jsnow@redhat.com>
> 
> 9: Is there a good reason for a void fn() to return its argument via a
> passed parameter? I see you're matching the other interface, but that
> strikes me as wonky.

It bothered me a bit too; beyond the copy-and-paste factor, my
justification was that the parameter is both input and output, rather
than output-only. But I don't mind respinning to add a preliminary patch
that fixes mirror_clip_sectors() to return a value, then pattern
mirror_clip_bytes to do likewise.

> 
> 11: Looks correct to me, but this one's a bit hairier than the rest. How
> many times do we truly need to round, adjust, clip, round again, align,
> clip, round, align, ...

I don't know that there are that many rounds of clipping going on, but
there is definitely a lot of scaling, and it gets better as later
patches make even more things be byte-based.

> 
> I'll take a peek at the last two tomorrow.

Thanks for plodding through it so far. For supposedly being no semantic
change, it is still definitely a lot to think about.  But the end result
is more legible, in my opinion.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based Eric Blake
@ 2017-04-18 22:15   ` John Snow
  2017-04-19 17:54     ` Eric Blake
  2017-04-19 20:32   ` John Snow
  1 sibling, 1 reply; 38+ messages in thread
From: John Snow @ 2017-04-18 22:15 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, Fam Zheng, qemu-block, Juan Quintela,
	Dr. David Alan Gilbert, Max Reitz, Stefan Hajnoczi



On 04/11/2017 06:29 PM, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the signature of the function to use int64_t *pnum ensures
> that the compiler enforces that all callers are updated.  For now,
> the io.c layer still assert()s that all callers are sector-aligned,
> but that can be relaxed when a later patch implements byte-based
> block status.  Therefore, for the most part this patch is just the
> addition of scaling at the callers followed by inverse scaling at
> bdrv_is_allocated().  But some code, particularly bdrv_commit(),
> gets a lot simpler because it no longer has to mess with sectors;
> also, it is now possible to pass NULL if the caller does not care
> how much of the image is allocated beyond the initial offset.
> 
> For ease of review, bdrv_is_allocated_above() will be tackled
> separately.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  include/block/block.h |  4 ++--
>  block/backup.c        | 17 +++++----------
>  block/commit.c        | 21 ++++++++-----------
>  block/io.c            | 49 +++++++++++++++++++++++++++++--------------
>  block/stream.c        |  5 +++--
>  block/vvfat.c         | 34 +++++++++++++++++-------------
>  migration/block.c     |  9 ++++----
>  qemu-img.c            |  5 ++++-
>  qemu-io-cmds.c        | 57 +++++++++++++++++++++++++--------------------------
>  9 files changed, 110 insertions(+), 91 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index b5289f7..8641149 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -422,8 +422,8 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>                                      int64_t sector_num,
>                                      int nb_sectors, int *pnum,
>                                      BlockDriverState **file);
> -int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
> -                      int *pnum);
> +int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
> +                      int64_t *pnum);
>  int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
>                              int64_t sector_num, int nb_sectors, int *pnum);
> 
> diff --git a/block/backup.c b/block/backup.c
> index 2a51e8b..63ca208 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -47,12 +47,6 @@ typedef struct BackupBlockJob {
>      QLIST_HEAD(, CowRequest) inflight_reqs;
>  } BackupBlockJob;
> 
> -/* Size of a cluster in sectors, instead of bytes. */
> -static inline int64_t cluster_size_sectors(BackupBlockJob *job)
> -{
> -  return job->cluster_size / BDRV_SECTOR_SIZE;
> -}
> -
>  /* See if in-flight requests overlap and wait for them to complete */
>  static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
>                                                         int64_t start,
> @@ -433,7 +427,6 @@ static void coroutine_fn backup_run(void *opaque)
>      BackupCompleteData *data;
>      BlockDriverState *bs = blk_bs(job->common.blk);
>      int64_t offset;
> -    int64_t sectors_per_cluster = cluster_size_sectors(job);
>      int ret = 0;
> 
>      QLIST_INIT(&job->inflight_reqs);
> @@ -465,12 +458,13 @@ static void coroutine_fn backup_run(void *opaque)
>              }
> 
>              if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
> -                int i, n;
> +                int i;
> +                int64_t n;
> 
>                  /* Check to see if these blocks are already in the
>                   * backing file. */
> 
> -                for (i = 0; i < sectors_per_cluster;) {
> +                for (i = 0; i < job->cluster_size;) {
>                      /* bdrv_is_allocated() only returns true/false based
>                       * on the first set of sectors it comes across that
>                       * are are all in the same state.
> @@ -478,9 +472,8 @@ static void coroutine_fn backup_run(void *opaque)
>                       * backup cluster length.  We end up copying more than
>                       * needed but at some point that is always the case. */
>                      alloced =
> -                        bdrv_is_allocated(bs,
> -                                          (offset >> BDRV_SECTOR_BITS) + i,
> -                                          sectors_per_cluster - i, &n);
> +                        bdrv_is_allocated(bs, offset + i,
> +                                          job->cluster_size - i, &n);
>                      i += n;
> 
>                      if (alloced || n == 0) {
> diff --git a/block/commit.c b/block/commit.c
> index b7d3e60..4d6bb2a 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -429,7 +429,7 @@ fail:
>  }
> 
> 
> -#define COMMIT_BUF_SECTORS 2048
> +#define COMMIT_BUF_SIZE (2048 * BDRV_SECTOR_SIZE)
> 
>  /* commit COW file into the raw image */
>  int bdrv_commit(BlockDriverState *bs)
> @@ -438,8 +438,9 @@ int bdrv_commit(BlockDriverState *bs)
>      BlockDriverState *backing_file_bs = NULL;
>      BlockDriverState *commit_top_bs = NULL;
>      BlockDriver *drv = bs->drv;
> -    int64_t sector, total_sectors, length, backing_length;
> -    int n, ro, open_flags;
> +    int64_t offset, length, backing_length;
> +    int ro, open_flags;
> +    int64_t n;
>      int ret = 0;
>      uint8_t *buf = NULL;
>      Error *local_err = NULL;
> @@ -517,30 +518,26 @@ int bdrv_commit(BlockDriverState *bs)
>          }
>      }
> 
> -    total_sectors = length >> BDRV_SECTOR_BITS;
> -
>      /* blk_try_blockalign() for src will choose an alignment that works for
>       * backing as well, so no need to compare the alignment manually. */
> -    buf = blk_try_blockalign(src, COMMIT_BUF_SECTORS * BDRV_SECTOR_SIZE);
> +    buf = blk_try_blockalign(src, COMMIT_BUF_SIZE);
>      if (buf == NULL) {
>          ret = -ENOMEM;
>          goto ro_cleanup;
>      }
> 
> -    for (sector = 0; sector < total_sectors; sector += n) {
> -        ret = bdrv_is_allocated(bs, sector, COMMIT_BUF_SECTORS, &n);
> +    for (offset = 0; offset < length; offset += n) {
> +        ret = bdrv_is_allocated(bs, offset, COMMIT_BUF_SIZE, &n);
>          if (ret < 0) {
>              goto ro_cleanup;
>          }
>          if (ret) {
> -            ret = blk_pread(src, sector * BDRV_SECTOR_SIZE, buf,
> -                            n * BDRV_SECTOR_SIZE);
> +            ret = blk_pread(src, offset, buf, n);
>              if (ret < 0) {
>                  goto ro_cleanup;
>              }
> 
> -            ret = blk_pwrite(backing, sector * BDRV_SECTOR_SIZE, buf,
> -                             n * BDRV_SECTOR_SIZE, 0);
> +            ret = blk_pwrite(backing, offset, buf, n, 0);
>              if (ret < 0) {
>                  goto ro_cleanup;
>              }
> diff --git a/block/io.c b/block/io.c
> index 8706bfa..438a493 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1054,14 +1054,15 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
>          int64_t start_sector = offset >> BDRV_SECTOR_BITS;
>          int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
>          unsigned int nb_sectors = end_sector - start_sector;
> -        int pnum;
> +        int64_t pnum;
> 
> -        ret = bdrv_is_allocated(bs, start_sector, nb_sectors, &pnum);
> +        ret = bdrv_is_allocated(bs, start_sector << BDRV_SECTOR_BITS,
> +                                nb_sectors << BDRV_SECTOR_BITS, &pnum);
>          if (ret < 0) {
>              goto out;
>          }
> 
> -        if (!ret || pnum != nb_sectors) {
> +        if (!ret || pnum != nb_sectors << BDRV_SECTOR_BITS) {
>              ret = bdrv_co_do_copy_on_readv(child, offset, bytes, qiov);
>              goto out;
>          }
> @@ -1903,15 +1904,26 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
>                                         sector_num, nb_sectors, pnum, file);
>  }
> 
> -int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
> -                                   int nb_sectors, int *pnum)
> +int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
> +                                   int64_t bytes, int64_t *pnum)
>  {
>      BlockDriverState *file;
> -    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum,
> -                                        &file);
> +    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
> +    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
> +    int64_t ret;
> +    int psectors;
> +
> +    assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
> +    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
> +           bytes < INT_MAX * BDRV_SECTOR_SIZE);
> +    ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
> +                                &file);
>      if (ret < 0) {
>          return ret;
>      }
> +    if (pnum) {
> +        *pnum = psectors * BDRV_SECTOR_SIZE;
> +    }
>      return !!(ret & BDRV_BLOCK_ALLOCATED);
>  }
> 
> @@ -1920,7 +1932,8 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
>   *
>   * Return true if the given sector is allocated in any image between
>   * BASE and TOP (inclusive).  BASE can be NULL to check if the given
> - * sector is allocated in any image of the chain.  Return false otherwise.
> + * sector is allocated in any image of the chain.  Return false otherwise,
> + * or negative errno on failure.
>   *
>   * 'pnum' is set to the number of sectors (including and immediately following
>   *  the specified sector) that are known to be in the same
> @@ -1937,13 +1950,19 @@ int bdrv_is_allocated_above(BlockDriverState *top,
> 
>      intermediate = top;
>      while (intermediate && intermediate != base) {
> -        int pnum_inter;
> -        ret = bdrv_is_allocated(intermediate, sector_num, nb_sectors,
> +        int64_t pnum_inter;
> +        int psectors_inter;
> +
> +        ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
> +                                nb_sectors * BDRV_SECTOR_SIZE,
>                                  &pnum_inter);
>          if (ret < 0) {
>              return ret;
> -        } else if (ret) {
> -            *pnum = pnum_inter;
> +        }
> +        assert(pnum_inter < INT_MAX * BDRV_SECTOR_SIZE);
> +        psectors_inter = pnum_inter >> BDRV_SECTOR_BITS;
> +        if (ret) {
> +            *pnum = psectors_inter;
>              return 1;
>          }
> 
> @@ -1953,10 +1972,10 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>           *
>           * [sector_num+x, nr_sectors] allocated.
>           */
> -        if (n > pnum_inter &&
> +        if (n > psectors_inter &&
>              (intermediate == top ||
> -             sector_num + pnum_inter < intermediate->total_sectors)) {
> -            n = pnum_inter;
> +             sector_num + psectors_inter < intermediate->total_sectors)) {
> +            n = psectors_inter;
>          }
> 
>          intermediate = backing_bs(intermediate);
> diff --git a/block/stream.c b/block/stream.c
> index 3ed4fef..85502eb 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -138,6 +138,7 @@ static void coroutine_fn stream_run(void *opaque)
> 
>      for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
>          bool copy;
> +        int64_t count = 0;
> 
>          /* Note that even when no rate limit is applied we need to yield
>           * with no pending I/O here so that bdrv_drain_all() returns.
> @@ -149,8 +150,8 @@ static void coroutine_fn stream_run(void *opaque)
> 
>          copy = false;
> 
> -        ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
> -                                STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
> +        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count);
> +        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>          if (ret == 1) {
>              /* Allocated in the top, no need to copy.  */
>          } else if (ret >= 0) {
> diff --git a/block/vvfat.c b/block/vvfat.c
> index af5153d..bef2056 100644
> --- a/block/vvfat.c
> +++ b/block/vvfat.c
> @@ -1393,24 +1393,27 @@ static int vvfat_read(BlockDriverState *bs, int64_t sector_num,
>  	if (sector_num >= bs->total_sectors)
>  	   return -1;
>  	if (s->qcow) {
> -	    int n;
> +            int64_t n;
>              int ret;
> -            ret = bdrv_is_allocated(s->qcow->bs, sector_num,
> -                                    nb_sectors - i, &n);
> +            ret = bdrv_is_allocated(s->qcow->bs, sector_num * BDRV_SECTOR_SIZE,
> +                                    (nb_sectors - i) * BDRV_SECTOR_SIZE, &n);
>              if (ret < 0) {
>                  return ret;
>              }
>              if (ret) {
> -                DLOG(fprintf(stderr, "sectors %d+%d allocated\n",
> -                             (int)sector_num, n));
> -                if (bdrv_read(s->qcow, sector_num, buf + i * 0x200, n)) {
> +                DLOG(fprintf(stderr, "sectors %" PRId64 "+%" PRId64
> +                             " allocated\n", sector_num,
> +                             n >> BDRV_SECTOR_BITS));
> +                if (bdrv_read(s->qcow, sector_num, buf + i * 0x200,
> +                              n >> BDRV_SECTOR_BITS)) {
>                      return -1;
>                  }
> -                i += n - 1;
> -                sector_num += n - 1;
> +                i += (n >> BDRV_SECTOR_BITS) - 1;
> +                sector_num += (n >> BDRV_SECTOR_BITS) - 1;
>                  continue;
>              }
> -DLOG(fprintf(stderr, "sector %d not allocated\n", (int)sector_num));
> +            DLOG(fprintf(stderr, "sector %" PRId64 " not allocated\n",
> +                         sector_num));
>  	}
>  	if(sector_num<s->faked_sectors) {
>  	    if(sector_num<s->first_sectors_number)
> @@ -1678,7 +1681,7 @@ static inline bool cluster_was_modified(BDRVVVFATState *s,
>                                          uint32_t cluster_num)
>  {
>      int was_modified = 0;
> -    int i, dummy;
> +    int i;
> 
>      if (s->qcow == NULL) {
>          return 0;
> @@ -1686,8 +1689,9 @@ static inline bool cluster_was_modified(BDRVVVFATState *s,
> 
>      for (i = 0; !was_modified && i < s->sectors_per_cluster; i++) {
>          was_modified = bdrv_is_allocated(s->qcow->bs,
> -                                         cluster2sector(s, cluster_num) + i,
> -                                         1, &dummy);
> +                                         (cluster2sector(s, cluster_num) +

cluster2sector2bytes2bits2atoms()

> +                                          i) * BDRV_SECTOR_SIZE,
> +                                         BDRV_SECTOR_SIZE, NULL);
>      }
> 
>      /*
> @@ -1834,7 +1838,7 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s,
>  	    }
> 
>  	    if (copy_it) {
> -		int i, dummy;
> +                int i;
>  		/*
>  		 * This is horribly inefficient, but that is okay, since
>  		 * it is rarely executed, if at all.
> @@ -1845,7 +1849,9 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s,
>                  for (i = 0; i < s->sectors_per_cluster; i++) {
>                      int res;
> 
> -                    res = bdrv_is_allocated(s->qcow->bs, offset + i, 1, &dummy);
> +                    res = bdrv_is_allocated(s->qcow->bs,
> +                                            (offset + i) * BDRV_SECTOR_SIZE,
> +                                            BDRV_SECTOR_SIZE, NULL);
>                      if (res < 0) {
>                          return -1;
>                      }
> diff --git a/migration/block.c b/migration/block.c
> index 7734ff7..18d50ff 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -36,7 +36,7 @@
>  #define BLK_MIG_FLAG_PROGRESS           0x04
>  #define BLK_MIG_FLAG_ZERO_BLOCK         0x08
> 
> -#define MAX_IS_ALLOCATED_SEARCH 65536
> +#define MAX_IS_ALLOCATED_SEARCH (65536 * BDRV_SECTOR_SIZE)
> 
>  #define MAX_INFLIGHT_IO 512
> 
> @@ -272,6 +272,7 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>      BlockBackend *bb = bmds->blk;
>      BlkMigBlock *blk;
>      int nr_sectors;
> +    int64_t count;
> 
>      if (bmds->shared_base) {
>          qemu_mutex_lock_iothread();
> @@ -279,9 +280,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>          /* Skip unallocated sectors; intentionally treats failure as
>           * an allocated sector */
>          while (cur_sector < total_sectors &&
> -               !bdrv_is_allocated(blk_bs(bb), cur_sector,
> -                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
> -            cur_sector += nr_sectors;
> +               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
> +                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
> +            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);

Hm, what's the story here, why are we rounding this up? If, in theory,
we have a partially allocated cluster won't we advance past that?

>          }
>          aio_context_release(blk_get_aio_context(bb));
>          qemu_mutex_unlock_iothread();
> diff --git a/qemu-img.c b/qemu-img.c
> index 37c2894..2f21d69 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -3207,6 +3207,7 @@ static int img_rebase(int argc, char **argv)
>          int64_t new_backing_num_sectors = 0;
>          uint64_t sector;
>          int n;
> +        int64_t count;
>          float local_progress = 0;
> 
>          buf_old = blk_blockalign(blk, IO_BUF_SIZE);
> @@ -3254,12 +3255,14 @@ static int img_rebase(int argc, char **argv)
>              }
> 
>              /* If the cluster is allocated, we don't need to take action */
> -            ret = bdrv_is_allocated(bs, sector, n, &n);
> +            ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS,
> +                                    n << BDRV_SECTOR_BITS, &count);
>              if (ret < 0) {
>                  error_report("error while reading image metadata: %s",
>                               strerror(-ret));
>                  goto out;
>              }
> +            n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>              if (ret) {
>                  continue;
>              }
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 0ac7457..62278ea 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1760,12 +1760,12 @@ out:
>  static int alloc_f(BlockBackend *blk, int argc, char **argv)
>  {
>      BlockDriverState *bs = blk_bs(blk);
> -    int64_t offset, sector_num, nb_sectors, remaining, bytes;
> +    int64_t offset, start, remaining, bytes;
>      char s1[64];
> -    int num, ret;
> -    int64_t sum_alloc;
> +    int ret;
> +    int64_t num, sum_alloc;
> 
> -    offset = cvtnum(argv[1]);
> +    start = offset = cvtnum(argv[1]);
>      if (offset < 0) {
>          print_cvtnum_err(offset, argv[1]);
>          return 0;
> @@ -1793,32 +1793,30 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
>                 bytes);
>          return 0;
>      }
> -    nb_sectors = bytes >> BDRV_SECTOR_BITS;
> 
> -    remaining = nb_sectors;
> +    remaining = bytes;
>      sum_alloc = 0;
> -    sector_num = offset >> 9;
>      while (remaining) {
> -        ret = bdrv_is_allocated(bs, sector_num, remaining, &num);
> +        ret = bdrv_is_allocated(bs, offset, remaining, &num);
>          if (ret < 0) {
>              printf("is_allocated failed: %s\n", strerror(-ret));
>              return 0;
>          }
> -        sector_num += num;
> +        offset += num;
>          remaining -= num;
>          if (ret) {
>              sum_alloc += num;
>          }
>          if (num == 0) {
> -            nb_sectors -= remaining;
> +            bytes -= remaining;
>              remaining = 0;
>          }
>      }
> 
> -    cvtstr(offset, s1, sizeof(s1));
> +    cvtstr(start, s1, sizeof(s1));
> 
>      printf("%"PRId64"/%"PRId64" bytes allocated at offset %s\n",
> -           sum_alloc << BDRV_SECTOR_BITS, nb_sectors << BDRV_SECTOR_BITS, s1);
> +           sum_alloc, bytes, s1);
>      return 0;
>  }
> 
> @@ -1833,14 +1831,15 @@ static const cmdinfo_t alloc_cmd = {
>  };
> 
> 
> -static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
> -                            int64_t nb_sectors, int64_t *pnum)
> +static int map_is_allocated(BlockDriverState *bs, int64_t offset,
> +                            int64_t bytes, int64_t *pnum)
>  {
> -    int num, num_checked;
> +    int64_t num;
> +    int num_checked;
>      int ret, firstret;
> 
> -    num_checked = MIN(nb_sectors, INT_MAX);
> -    ret = bdrv_is_allocated(bs, sector_num, num_checked, &num);
> +    num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
> +    ret = bdrv_is_allocated(bs, offset, num_checked, &num);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -1848,12 +1847,12 @@ static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
>      firstret = ret;
>      *pnum = num;
> 
> -    while (nb_sectors > 0 && ret == firstret) {
> -        sector_num += num;
> -        nb_sectors -= num;
> +    while (bytes > 0 && ret == firstret) {
> +        offset += num;
> +        bytes -= num;
> 
> -        num_checked = MIN(nb_sectors, INT_MAX);
> -        ret = bdrv_is_allocated(bs, sector_num, num_checked, &num);
> +        num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
> +        ret = bdrv_is_allocated(bs, offset, num_checked, &num);
>          if (ret == firstret && num) {
>              *pnum += num;
>          } else {
> @@ -1867,7 +1866,7 @@ static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
>  static int map_f(BlockBackend *blk, int argc, char **argv)
>  {
>      int64_t offset;
> -    int64_t nb_sectors, total_sectors;
> +    int64_t bytes, total_sectors;
>      char s1[64];
>      int64_t num;
>      int ret;
> @@ -1881,10 +1880,10 @@ static int map_f(BlockBackend *blk, int argc, char **argv)
>          return 0;
>      }
> 
> -    nb_sectors = total_sectors;
> +    bytes = total_sectors * BDRV_SECTOR_SIZE;
> 
>      do {
> -        ret = map_is_allocated(blk_bs(blk), offset, nb_sectors, &num);
> +        ret = map_is_allocated(blk_bs(blk), offset, bytes, &num);
>          if (ret < 0) {
>              error_report("Failed to get allocation status: %s", strerror(-ret));
>              return 0;
> @@ -1894,13 +1893,13 @@ static int map_f(BlockBackend *blk, int argc, char **argv)
>          }
> 
>          retstr = ret ? "    allocated" : "not allocated";
> -        cvtstr(offset << 9ULL, s1, sizeof(s1));
> +        cvtstr(offset, s1, sizeof(s1));
>          printf("[% 24" PRId64 "] % 16" PRId64 " bytes %s at offset %s (%d)\n",
> -               offset << 9ULL, num << 9ULL, retstr, s1, ret);
> +               offset, num, retstr, s1, ret);
> 
>          offset += num;
> -        nb_sectors -= num;
> -    } while (offset < total_sectors);
> +        bytes -= num;
> +    } while (offset < total_sectors * BDRV_SECTOR_SIZE);
> 
>      return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 02/17] trace: Show blockjob actions via bytes, not sectors
  2017-04-17 19:55     ` Eric Blake
@ 2017-04-19  9:12       ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2017-04-19  9:12 UTC (permalink / raw)
  To: Eric Blake; +Cc: John Snow, qemu-devel, kwolf, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1347 bytes --]

On Mon, Apr 17, 2017 at 02:55:53PM -0500, Eric Blake wrote:
> On 04/17/2017 02:18 PM, John Snow wrote:
> >> @@ -346,13 +349,15 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> >>  backup_do_cow_skip(void *job, int64_t start) "job %p start %"PRId64
> >>  backup_do_cow_process(void *job, int64_t start) "job %p start %"PRId64
> >>  backup_do_cow_read_fail(void *job, int64_t start, int ret) "job %p start %"PRId64" ret %d"
> > 
> > I guess these three were "cluster" based, but you didn't need to change
> > anything to assert them as byte-based.
> > 
> >>
> > 
> > Under the assumption that it's okay to change tracing output without
> > being able to differentiate between new and old output:
> 
> I'll leave that to Stefan's discretion as trace maintainer, but my
> thoughts are that tracing is for debugging purposes, and as long as you
> know what binary you are debugging, you can figure out what the trace
> message meant by referring back to the source that generated that binary.

That's okay.  Trace events are not a stable interface.  Most trace
events are low-level and require access to the corresponding source tree
anyway.

The only exception is that tools in scripts/ may need to be updated when
trace events change (e.g. scripts to graph or analyze specific trace
events).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based
  2017-04-18 22:15   ` [Qemu-devel] [Qemu-block] " John Snow
@ 2017-04-19 17:54     ` Eric Blake
  2017-04-19 19:37       ` John Snow
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2017-04-19 17:54 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Fam Zheng, qemu-block, Juan Quintela,
	Dr. David Alan Gilbert, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1329 bytes --]

On 04/18/2017 05:15 PM, John Snow wrote:

>> @@ -279,9 +280,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>>          /* Skip unallocated sectors; intentionally treats failure as
>>           * an allocated sector */
>>          while (cur_sector < total_sectors &&
>> -               !bdrv_is_allocated(blk_bs(bb), cur_sector,
>> -                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
>> -            cur_sector += nr_sectors;
>> +               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
>> +                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
>> +            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
> 
> Hm, what's the story here, why are we rounding this up? If, in theory,
> we have a partially allocated cluster won't we advance past that?

This is rounding to sectors, not clusters (that is, the code is supposed
to advance cur_sector identically pre- and post-patch).  As to whether
the overall algorithm makes sense, or could use some tweaking by
converting migration/block.c to do everything by bytes instead of by
sectors, I haven't yet given that any serious time.


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based
  2017-04-19 17:54     ` Eric Blake
@ 2017-04-19 19:37       ` John Snow
  0 siblings, 0 replies; 38+ messages in thread
From: John Snow @ 2017-04-19 19:37 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, Fam Zheng, qemu-block, Juan Quintela,
	Dr. David Alan Gilbert, Max Reitz, Stefan Hajnoczi



On 04/19/2017 01:54 PM, Eric Blake wrote:
> On 04/18/2017 05:15 PM, John Snow wrote:
> 
>>> @@ -279,9 +280,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>>>          /* Skip unallocated sectors; intentionally treats failure as
>>>           * an allocated sector */
>>>          while (cur_sector < total_sectors &&
>>> -               !bdrv_is_allocated(blk_bs(bb), cur_sector,
>>> -                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
>>> -            cur_sector += nr_sectors;
>>> +               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
>>> +                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
>>> +            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>>
>> Hm, what's the story here, why are we rounding this up? If, in theory,
>> we have a partially allocated cluster won't we advance past that?
> 
> This is rounding to sectors, not clusters (that is, the code is supposed
> to advance cur_sector identically pre- and post-patch).  As to whether
> the overall algorithm makes sense, or could use some tweaking by
> converting migration/block.c to do everything by bytes instead of by
> sectors, I haven't yet given that any serious time.
> 

Err, right... temporary brain schism. DIV_ROUND_UP not ALIGN_UP, my mistake.

> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based Eric Blake
  2017-04-18 22:15   ` [Qemu-devel] [Qemu-block] " John Snow
@ 2017-04-19 20:32   ` John Snow
  2017-04-19 21:12     ` Eric Blake
  1 sibling, 1 reply; 38+ messages in thread
From: John Snow @ 2017-04-19 20:32 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, Fam Zheng, qemu-block, Juan Quintela,
	Dr. David Alan Gilbert, Max Reitz, Stefan Hajnoczi



On 04/11/2017 06:29 PM, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the signature of the function to use int64_t *pnum ensures
> that the compiler enforces that all callers are updated.  For now,
> the io.c layer still assert()s that all callers are sector-aligned,
> but that can be relaxed when a later patch implements byte-based
> block status.  Therefore, for the most part this patch is just the
> addition of scaling at the callers followed by inverse scaling at
> bdrv_is_allocated().  But some code, particularly bdrv_commit(),
> gets a lot simpler because it no longer has to mess with sectors;
> also, it is now possible to pass NULL if the caller does not care
> how much of the image is allocated beyond the initial offset.
> 
> For ease of review, bdrv_is_allocated_above() will be tackled
> separately.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  include/block/block.h |  4 ++--
>  block/backup.c        | 17 +++++----------
>  block/commit.c        | 21 ++++++++-----------
>  block/io.c            | 49 +++++++++++++++++++++++++++++--------------
>  block/stream.c        |  5 +++--
>  block/vvfat.c         | 34 +++++++++++++++++-------------
>  migration/block.c     |  9 ++++----
>  qemu-img.c            |  5 ++++-
>  qemu-io-cmds.c        | 57 +++++++++++++++++++++++++--------------------------
>  9 files changed, 110 insertions(+), 91 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index b5289f7..8641149 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -422,8 +422,8 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>                                      int64_t sector_num,
>                                      int nb_sectors, int *pnum,
>                                      BlockDriverState **file);
> -int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
> -                      int *pnum);
> +int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
> +                      int64_t *pnum);
>  int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
>                              int64_t sector_num, int nb_sectors, int *pnum);
> 
> diff --git a/block/backup.c b/block/backup.c
> index 2a51e8b..63ca208 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -47,12 +47,6 @@ typedef struct BackupBlockJob {
>      QLIST_HEAD(, CowRequest) inflight_reqs;
>  } BackupBlockJob;
> 
> -/* Size of a cluster in sectors, instead of bytes. */
> -static inline int64_t cluster_size_sectors(BackupBlockJob *job)
> -{
> -  return job->cluster_size / BDRV_SECTOR_SIZE;
> -}
> -
>  /* See if in-flight requests overlap and wait for them to complete */
>  static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
>                                                         int64_t start,
> @@ -433,7 +427,6 @@ static void coroutine_fn backup_run(void *opaque)
>      BackupCompleteData *data;
>      BlockDriverState *bs = blk_bs(job->common.blk);
>      int64_t offset;
> -    int64_t sectors_per_cluster = cluster_size_sectors(job);
>      int ret = 0;
> 
>      QLIST_INIT(&job->inflight_reqs);
> @@ -465,12 +458,13 @@ static void coroutine_fn backup_run(void *opaque)
>              }
> 
>              if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
> -                int i, n;
> +                int i;
> +                int64_t n;
> 
>                  /* Check to see if these blocks are already in the
>                   * backing file. */
> 
> -                for (i = 0; i < sectors_per_cluster;) {
> +                for (i = 0; i < job->cluster_size;) {
>                      /* bdrv_is_allocated() only returns true/false based
>                       * on the first set of sectors it comes across that
>                       * are are all in the same state.
> @@ -478,9 +472,8 @@ static void coroutine_fn backup_run(void *opaque)
>                       * backup cluster length.  We end up copying more than
>                       * needed but at some point that is always the case. */
>                      alloced =
> -                        bdrv_is_allocated(bs,
> -                                          (offset >> BDRV_SECTOR_BITS) + i,
> -                                          sectors_per_cluster - i, &n);
> +                        bdrv_is_allocated(bs, offset + i,
> +                                          job->cluster_size - i, &n);
>                      i += n;
> 
>                      if (alloced || n == 0) {
> diff --git a/block/commit.c b/block/commit.c
> index b7d3e60..4d6bb2a 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -429,7 +429,7 @@ fail:
>  }
> 
> 
> -#define COMMIT_BUF_SECTORS 2048
> +#define COMMIT_BUF_SIZE (2048 * BDRV_SECTOR_SIZE)
> 
>  /* commit COW file into the raw image */
>  int bdrv_commit(BlockDriverState *bs)
> @@ -438,8 +438,9 @@ int bdrv_commit(BlockDriverState *bs)
>      BlockDriverState *backing_file_bs = NULL;
>      BlockDriverState *commit_top_bs = NULL;
>      BlockDriver *drv = bs->drv;
> -    int64_t sector, total_sectors, length, backing_length;
> -    int n, ro, open_flags;
> +    int64_t offset, length, backing_length;
> +    int ro, open_flags;
> +    int64_t n;
>      int ret = 0;
>      uint8_t *buf = NULL;
>      Error *local_err = NULL;
> @@ -517,30 +518,26 @@ int bdrv_commit(BlockDriverState *bs)
>          }
>      }
> 
> -    total_sectors = length >> BDRV_SECTOR_BITS;
> -
>      /* blk_try_blockalign() for src will choose an alignment that works for
>       * backing as well, so no need to compare the alignment manually. */
> -    buf = blk_try_blockalign(src, COMMIT_BUF_SECTORS * BDRV_SECTOR_SIZE);
> +    buf = blk_try_blockalign(src, COMMIT_BUF_SIZE);
>      if (buf == NULL) {
>          ret = -ENOMEM;
>          goto ro_cleanup;
>      }
> 
> -    for (sector = 0; sector < total_sectors; sector += n) {
> -        ret = bdrv_is_allocated(bs, sector, COMMIT_BUF_SECTORS, &n);
> +    for (offset = 0; offset < length; offset += n) {
> +        ret = bdrv_is_allocated(bs, offset, COMMIT_BUF_SIZE, &n);
>          if (ret < 0) {
>              goto ro_cleanup;
>          }
>          if (ret) {
> -            ret = blk_pread(src, sector * BDRV_SECTOR_SIZE, buf,
> -                            n * BDRV_SECTOR_SIZE);
> +            ret = blk_pread(src, offset, buf, n);
>              if (ret < 0) {
>                  goto ro_cleanup;
>              }
> 
> -            ret = blk_pwrite(backing, sector * BDRV_SECTOR_SIZE, buf,
> -                             n * BDRV_SECTOR_SIZE, 0);
> +            ret = blk_pwrite(backing, offset, buf, n, 0);
>              if (ret < 0) {
>                  goto ro_cleanup;
>              }
> diff --git a/block/io.c b/block/io.c
> index 8706bfa..438a493 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1054,14 +1054,15 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
>          int64_t start_sector = offset >> BDRV_SECTOR_BITS;
>          int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
>          unsigned int nb_sectors = end_sector - start_sector;
> -        int pnum;
> +        int64_t pnum;
> 
> -        ret = bdrv_is_allocated(bs, start_sector, nb_sectors, &pnum);
> +        ret = bdrv_is_allocated(bs, start_sector << BDRV_SECTOR_BITS,
> +                                nb_sectors << BDRV_SECTOR_BITS, &pnum);
>          if (ret < 0) {
>              goto out;
>          }
> 
> -        if (!ret || pnum != nb_sectors) {
> +        if (!ret || pnum != nb_sectors << BDRV_SECTOR_BITS) {
>              ret = bdrv_co_do_copy_on_readv(child, offset, bytes, qiov);
>              goto out;
>          }
> @@ -1903,15 +1904,26 @@ int64_t bdrv_get_block_status(BlockDriverState *bs,
>                                         sector_num, nb_sectors, pnum, file);
>  }
> 
> -int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
> -                                   int nb_sectors, int *pnum)
> +int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
> +                                   int64_t bytes, int64_t *pnum)
>  {
>      BlockDriverState *file;
> -    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum,
> -                                        &file);
> +    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
> +    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
> +    int64_t ret;
> +    int psectors;
> +
> +    assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
> +    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
> +           bytes < INT_MAX * BDRV_SECTOR_SIZE);
> +    ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
> +                                &file);
>      if (ret < 0) {
>          return ret;
>      }
> +    if (pnum) {
> +        *pnum = psectors * BDRV_SECTOR_SIZE;
> +    }
>      return !!(ret & BDRV_BLOCK_ALLOCATED);
>  }
> 
> @@ -1920,7 +1932,8 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
>   *
>   * Return true if the given sector is allocated in any image between
>   * BASE and TOP (inclusive).  BASE can be NULL to check if the given
> - * sector is allocated in any image of the chain.  Return false otherwise.
> + * sector is allocated in any image of the chain.  Return false otherwise,
> + * or negative errno on failure.
>   *
>   * 'pnum' is set to the number of sectors (including and immediately following
>   *  the specified sector) that are known to be in the same
> @@ -1937,13 +1950,19 @@ int bdrv_is_allocated_above(BlockDriverState *top,
> 
>      intermediate = top;
>      while (intermediate && intermediate != base) {
> -        int pnum_inter;
> -        ret = bdrv_is_allocated(intermediate, sector_num, nb_sectors,
> +        int64_t pnum_inter;
> +        int psectors_inter;
> +
> +        ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
> +                                nb_sectors * BDRV_SECTOR_SIZE,
>                                  &pnum_inter);
>          if (ret < 0) {
>              return ret;
> -        } else if (ret) {
> -            *pnum = pnum_inter;
> +        }
> +        assert(pnum_inter < INT_MAX * BDRV_SECTOR_SIZE);
> +        psectors_inter = pnum_inter >> BDRV_SECTOR_BITS;
> +        if (ret) {
> +            *pnum = psectors_inter;
>              return 1;
>          }
> 
> @@ -1953,10 +1972,10 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>           *
>           * [sector_num+x, nr_sectors] allocated.
>           */
> -        if (n > pnum_inter &&
> +        if (n > psectors_inter &&
>              (intermediate == top ||
> -             sector_num + pnum_inter < intermediate->total_sectors)) {
> -            n = pnum_inter;
> +             sector_num + psectors_inter < intermediate->total_sectors)) {
> +            n = psectors_inter;
>          }
> 
>          intermediate = backing_bs(intermediate);
> diff --git a/block/stream.c b/block/stream.c
> index 3ed4fef..85502eb 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -138,6 +138,7 @@ static void coroutine_fn stream_run(void *opaque)
> 
>      for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
>          bool copy;
> +        int64_t count = 0;
> 
>          /* Note that even when no rate limit is applied we need to yield
>           * with no pending I/O here so that bdrv_drain_all() returns.
> @@ -149,8 +150,8 @@ static void coroutine_fn stream_run(void *opaque)
> 
>          copy = false;
> 
> -        ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
> -                                STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
> +        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count);
> +        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>          if (ret == 1) {
>              /* Allocated in the top, no need to copy.  */
>          } else if (ret >= 0) {
> diff --git a/block/vvfat.c b/block/vvfat.c
> index af5153d..bef2056 100644
> --- a/block/vvfat.c
> +++ b/block/vvfat.c
> @@ -1393,24 +1393,27 @@ static int vvfat_read(BlockDriverState *bs, int64_t sector_num,
>  	if (sector_num >= bs->total_sectors)
>  	   return -1;
>  	if (s->qcow) {
> -	    int n;
> +            int64_t n;
>              int ret;
> -            ret = bdrv_is_allocated(s->qcow->bs, sector_num,
> -                                    nb_sectors - i, &n);
> +            ret = bdrv_is_allocated(s->qcow->bs, sector_num * BDRV_SECTOR_SIZE,
> +                                    (nb_sectors - i) * BDRV_SECTOR_SIZE, &n);
>              if (ret < 0) {
>                  return ret;
>              }
>              if (ret) {
> -                DLOG(fprintf(stderr, "sectors %d+%d allocated\n",
> -                             (int)sector_num, n));
> -                if (bdrv_read(s->qcow, sector_num, buf + i * 0x200, n)) {
> +                DLOG(fprintf(stderr, "sectors %" PRId64 "+%" PRId64
> +                             " allocated\n", sector_num,
> +                             n >> BDRV_SECTOR_BITS));
> +                if (bdrv_read(s->qcow, sector_num, buf + i * 0x200,
> +                              n >> BDRV_SECTOR_BITS)) {
>                      return -1;
>                  }
> -                i += n - 1;
> -                sector_num += n - 1;
> +                i += (n >> BDRV_SECTOR_BITS) - 1;
> +                sector_num += (n >> BDRV_SECTOR_BITS) - 1;
>                  continue;
>              }
> -DLOG(fprintf(stderr, "sector %d not allocated\n", (int)sector_num));
> +            DLOG(fprintf(stderr, "sector %" PRId64 " not allocated\n",
> +                         sector_num));
>  	}
>  	if(sector_num<s->faked_sectors) {
>  	    if(sector_num<s->first_sectors_number)
> @@ -1678,7 +1681,7 @@ static inline bool cluster_was_modified(BDRVVVFATState *s,
>                                          uint32_t cluster_num)
>  {
>      int was_modified = 0;
> -    int i, dummy;
> +    int i;
> 
>      if (s->qcow == NULL) {
>          return 0;
> @@ -1686,8 +1689,9 @@ static inline bool cluster_was_modified(BDRVVVFATState *s,
> 
>      for (i = 0; !was_modified && i < s->sectors_per_cluster; i++) {
>          was_modified = bdrv_is_allocated(s->qcow->bs,
> -                                         cluster2sector(s, cluster_num) + i,
> -                                         1, &dummy);
> +                                         (cluster2sector(s, cluster_num) +
> +                                          i) * BDRV_SECTOR_SIZE,
> +                                         BDRV_SECTOR_SIZE, NULL);
>      }
> 
>      /*
> @@ -1834,7 +1838,7 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s,
>  	    }
> 
>  	    if (copy_it) {
> -		int i, dummy;
> +                int i;
>  		/*
>  		 * This is horribly inefficient, but that is okay, since
>  		 * it is rarely executed, if at all.
> @@ -1845,7 +1849,9 @@ static uint32_t get_cluster_count_for_direntry(BDRVVVFATState* s,
>                  for (i = 0; i < s->sectors_per_cluster; i++) {
>                      int res;
> 
> -                    res = bdrv_is_allocated(s->qcow->bs, offset + i, 1, &dummy);
> +                    res = bdrv_is_allocated(s->qcow->bs,
> +                                            (offset + i) * BDRV_SECTOR_SIZE,
> +                                            BDRV_SECTOR_SIZE, NULL);
>                      if (res < 0) {
>                          return -1;
>                      }
> diff --git a/migration/block.c b/migration/block.c
> index 7734ff7..18d50ff 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -36,7 +36,7 @@
>  #define BLK_MIG_FLAG_PROGRESS           0x04
>  #define BLK_MIG_FLAG_ZERO_BLOCK         0x08
> 
> -#define MAX_IS_ALLOCATED_SEARCH 65536
> +#define MAX_IS_ALLOCATED_SEARCH (65536 * BDRV_SECTOR_SIZE)
> 
>  #define MAX_INFLIGHT_IO 512
> 
> @@ -272,6 +272,7 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>      BlockBackend *bb = bmds->blk;
>      BlkMigBlock *blk;
>      int nr_sectors;
> +    int64_t count;
> 
>      if (bmds->shared_base) {
>          qemu_mutex_lock_iothread();
> @@ -279,9 +280,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>          /* Skip unallocated sectors; intentionally treats failure as
>           * an allocated sector */
>          while (cur_sector < total_sectors &&
> -               !bdrv_is_allocated(blk_bs(bb), cur_sector,
> -                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
> -            cur_sector += nr_sectors;
> +               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
> +                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
> +            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>          }

Let's try asking again with the right vocabulary:

OK, so I guess the idea is that we definitely shouldn't ever have a
partially allocated sector; so I suppose this ROUND_UP is just a precaution?

What would it mean if this actually DID round up? Would we miss
transmitting a fraction of a sector because we assumed it was unallocated?

In other places, you scale down with X / BDRV_SECTOR_SIZE or an
equivalent bitshift, why does this receive a round *up* treatment?

Are we considering a future in which this function might actually give
us some unaligned answers? Do we need to re-audit all of the callers at
that point to make sure they cope appropriately?

--js

>          aio_context_release(blk_get_aio_context(bb));
>          qemu_mutex_unlock_iothread();
> diff --git a/qemu-img.c b/qemu-img.c
> index 37c2894..2f21d69 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -3207,6 +3207,7 @@ static int img_rebase(int argc, char **argv)
>          int64_t new_backing_num_sectors = 0;
>          uint64_t sector;
>          int n;
> +        int64_t count;
>          float local_progress = 0;
> 
>          buf_old = blk_blockalign(blk, IO_BUF_SIZE);
> @@ -3254,12 +3255,14 @@ static int img_rebase(int argc, char **argv)
>              }
> 
>              /* If the cluster is allocated, we don't need to take action */
> -            ret = bdrv_is_allocated(bs, sector, n, &n);
> +            ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS,
> +                                    n << BDRV_SECTOR_BITS, &count);
>              if (ret < 0) {
>                  error_report("error while reading image metadata: %s",
>                               strerror(-ret));
>                  goto out;
>              }
> +            n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>              if (ret) {
>                  continue;
>              }
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 0ac7457..62278ea 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1760,12 +1760,12 @@ out:
>  static int alloc_f(BlockBackend *blk, int argc, char **argv)
>  {
>      BlockDriverState *bs = blk_bs(blk);
> -    int64_t offset, sector_num, nb_sectors, remaining, bytes;
> +    int64_t offset, start, remaining, bytes;
>      char s1[64];
> -    int num, ret;
> -    int64_t sum_alloc;
> +    int ret;
> +    int64_t num, sum_alloc;
> 
> -    offset = cvtnum(argv[1]);
> +    start = offset = cvtnum(argv[1]);
>      if (offset < 0) {
>          print_cvtnum_err(offset, argv[1]);
>          return 0;
> @@ -1793,32 +1793,30 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
>                 bytes);
>          return 0;
>      }
> -    nb_sectors = bytes >> BDRV_SECTOR_BITS;
> 
> -    remaining = nb_sectors;
> +    remaining = bytes;
>      sum_alloc = 0;
> -    sector_num = offset >> 9;
>      while (remaining) {
> -        ret = bdrv_is_allocated(bs, sector_num, remaining, &num);
> +        ret = bdrv_is_allocated(bs, offset, remaining, &num);
>          if (ret < 0) {
>              printf("is_allocated failed: %s\n", strerror(-ret));
>              return 0;
>          }
> -        sector_num += num;
> +        offset += num;
>          remaining -= num;
>          if (ret) {
>              sum_alloc += num;
>          }
>          if (num == 0) {
> -            nb_sectors -= remaining;
> +            bytes -= remaining;
>              remaining = 0;
>          }
>      }
> 
> -    cvtstr(offset, s1, sizeof(s1));
> +    cvtstr(start, s1, sizeof(s1));
> 
>      printf("%"PRId64"/%"PRId64" bytes allocated at offset %s\n",
> -           sum_alloc << BDRV_SECTOR_BITS, nb_sectors << BDRV_SECTOR_BITS, s1);
> +           sum_alloc, bytes, s1);
>      return 0;
>  }
> 
> @@ -1833,14 +1831,15 @@ static const cmdinfo_t alloc_cmd = {
>  };
> 
> 
> -static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
> -                            int64_t nb_sectors, int64_t *pnum)
> +static int map_is_allocated(BlockDriverState *bs, int64_t offset,
> +                            int64_t bytes, int64_t *pnum)
>  {
> -    int num, num_checked;
> +    int64_t num;
> +    int num_checked;
>      int ret, firstret;
> 
> -    num_checked = MIN(nb_sectors, INT_MAX);
> -    ret = bdrv_is_allocated(bs, sector_num, num_checked, &num);
> +    num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
> +    ret = bdrv_is_allocated(bs, offset, num_checked, &num);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -1848,12 +1847,12 @@ static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
>      firstret = ret;
>      *pnum = num;
> 
> -    while (nb_sectors > 0 && ret == firstret) {
> -        sector_num += num;
> -        nb_sectors -= num;
> +    while (bytes > 0 && ret == firstret) {
> +        offset += num;
> +        bytes -= num;
> 
> -        num_checked = MIN(nb_sectors, INT_MAX);
> -        ret = bdrv_is_allocated(bs, sector_num, num_checked, &num);
> +        num_checked = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
> +        ret = bdrv_is_allocated(bs, offset, num_checked, &num);
>          if (ret == firstret && num) {
>              *pnum += num;
>          } else {
> @@ -1867,7 +1866,7 @@ static int map_is_allocated(BlockDriverState *bs, int64_t sector_num,
>  static int map_f(BlockBackend *blk, int argc, char **argv)
>  {
>      int64_t offset;
> -    int64_t nb_sectors, total_sectors;
> +    int64_t bytes, total_sectors;
>      char s1[64];
>      int64_t num;
>      int ret;
> @@ -1881,10 +1880,10 @@ static int map_f(BlockBackend *blk, int argc, char **argv)
>          return 0;
>      }
> 
> -    nb_sectors = total_sectors;
> +    bytes = total_sectors * BDRV_SECTOR_SIZE;
> 
>      do {
> -        ret = map_is_allocated(blk_bs(blk), offset, nb_sectors, &num);
> +        ret = map_is_allocated(blk_bs(blk), offset, bytes, &num);
>          if (ret < 0) {
>              error_report("Failed to get allocation status: %s", strerror(-ret));
>              return 0;
> @@ -1894,13 +1893,13 @@ static int map_f(BlockBackend *blk, int argc, char **argv)
>          }
> 
>          retstr = ret ? "    allocated" : "not allocated";
> -        cvtstr(offset << 9ULL, s1, sizeof(s1));
> +        cvtstr(offset, s1, sizeof(s1));
>          printf("[% 24" PRId64 "] % 16" PRId64 " bytes %s at offset %s (%d)\n",
> -               offset << 9ULL, num << 9ULL, retstr, s1, ret);
> +               offset, num, retstr, s1, ret);
> 
>          offset += num;
> -        nb_sectors -= num;
> -    } while (offset < total_sectors);
> +        bytes -= num;
> +    } while (offset < total_sectors * BDRV_SECTOR_SIZE);
> 
>      return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based
  2017-04-19 20:32   ` John Snow
@ 2017-04-19 21:12     ` Eric Blake
  2017-04-19 21:40       ` John Snow
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2017-04-19 21:12 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Fam Zheng, qemu-block, Juan Quintela,
	Dr. David Alan Gilbert, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 3026 bytes --]

On 04/19/2017 03:32 PM, John Snow wrote:

>> @@ -279,9 +280,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>>          /* Skip unallocated sectors; intentionally treats failure as
>>           * an allocated sector */
>>          while (cur_sector < total_sectors &&
>> -               !bdrv_is_allocated(blk_bs(bb), cur_sector,
>> -                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
>> -            cur_sector += nr_sectors;
>> +               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
>> +                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
>> +            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>>          }
> 
> Let's try asking again with the right vocabulary:
> 
> OK, so I guess the idea is that we definitely shouldn't ever have a
> partially allocated sector; so I suppose this ROUND_UP is just a precaution?

For now, yes - but the part 3 series (bdrv_get_block_status) makes
returning a sub-sector value possible.  For most drivers, it will only
happen at end-of-file (as status doesn't change mid-sector unless your
file is not a multiple of a sector size) - in fact, I found that I had
to patch the block layer to round up and sub-sector return that
coincides with end-of-file back to a round sector amount in order to
avoid breaking other code that assumed everything is allocated in sectors.

> 
> What would it mean if this actually DID round up? Would we miss
> transmitting a fraction of a sector because we assumed it was unallocated?

In general, the only mid-sector transition I encountered in testing was
at EOF, where the first half is allocated and the second half (beyond
EOF) was an implicit hole.  But here, you are worried about the opposite
case - can we ever have a hole that ends mid-sector, followed by actual
data in the second half.  Probably not, but I can add an assertion to
the generic block layer that mid-sector returns for an unallocated
answer can only happen at end-of-file.

> 
> In other places, you scale down with X / BDRV_SECTOR_SIZE or an
> equivalent bitshift, why does this receive a round *up* treatment?

Really? I thought I was being careful in this patch about ALWAYS
preserving the same sector value - the only time I did a bitshift was if
I already asserted the answer was aligned to sector boundaries to begin
with, and when I wasn't sure, it should be a round up.

> 
> Are we considering a future in which this function might actually give
> us some unaligned answers? Do we need to re-audit all of the callers at
> that point to make sure they cope appropriately?

Yes, I've been trying to do that all along, but a second set of eyes
never hurts.  Or, as I said, we can play it safe by guaranteeing that
even when byte-based, the block layer enforces sector-aligned answers.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based
  2017-04-19 21:12     ` Eric Blake
@ 2017-04-19 21:40       ` John Snow
  2017-05-10 22:33         ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: John Snow @ 2017-04-19 21:40 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, Fam Zheng, qemu-block, Juan Quintela,
	Dr. David Alan Gilbert, Max Reitz, Stefan Hajnoczi



On 04/19/2017 05:12 PM, Eric Blake wrote:
> On 04/19/2017 03:32 PM, John Snow wrote:
> 
>>> @@ -279,9 +280,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>>>          /* Skip unallocated sectors; intentionally treats failure as
>>>           * an allocated sector */
>>>          while (cur_sector < total_sectors &&
>>> -               !bdrv_is_allocated(blk_bs(bb), cur_sector,
>>> -                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
>>> -            cur_sector += nr_sectors;
>>> +               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
>>> +                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
>>> +            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>>>          }
>>
>> Let's try asking again with the right vocabulary:
>>
>> OK, so I guess the idea is that we definitely shouldn't ever have a
>> partially allocated sector; so I suppose this ROUND_UP is just a precaution?
> 
> For now, yes - but the part 3 series (bdrv_get_block_status) makes
> returning a sub-sector value possible.  For most drivers, it will only
> happen at end-of-file (as status doesn't change mid-sector unless your
> file is not a multiple of a sector size) - in fact, I found that I had
> to patch the block layer to round up and sub-sector return that
> coincides with end-of-file back to a round sector amount in order to
> avoid breaking other code that assumed everything is allocated in sectors.

I'm sorry, I'm having difficulty parsing your last sentence.

> 
>>
>> What would it mean if this actually DID round up? Would we miss
>> transmitting a fraction of a sector because we assumed it was unallocated?
> 
> In general, the only mid-sector transition I encountered in testing was
> at EOF, where the first half is allocated and the second half (beyond
> EOF) was an implicit hole.  But here, you are worried about the opposite
> case - can we ever have a hole that ends mid-sector, followed by actual
> data in the second half.  Probably not, but I can add an assertion to
> the generic block layer that mid-sector returns for an unallocated
> answer can only happen at end-of-file.
> 

Yeah, just curious about the implications from a naive-user point of
view, as it makes the algorithm in the caller look ugly now.

I agree with you that there is absolutely definitely no real problem
here right now, but I do secretly wonder if we'll invent a way to screw
ourselves over.

Presumably it will get converted at SOME point like everything else, and
it won't be a concern anymore.

Presumably Presumably that will even happen before we invent a way to
screw ourselves over.

>>
>> In other places, you scale down with X / BDRV_SECTOR_SIZE or an
>> equivalent bitshift, why does this receive a round *up* treatment?
> 
> Really? I thought I was being careful in this patch about ALWAYS
> preserving the same sector value - the only time I did a bitshift was if
> I already asserted the answer was aligned to sector boundaries to begin
> with, and when I wasn't sure, it should be a round up.
> 

You're probably right as I hadn't been paying perfectly close attention
to exactly how you'd been scaling the values, this is simply the only
one that stood out to me as slightly strange due to the DIV_ROUND_UP
macro so I had assumed it stood out, but maybe I'm misidentifying why.

>>
>> Are we considering a future in which this function might actually give
>> us some unaligned answers? Do we need to re-audit all of the callers at
>> that point to make sure they cope appropriately?
> 
> Yes, I've been trying to do that all along, but a second set of eyes
> never hurts.  Or, as I said, we can play it safe by guaranteeing that
> even when byte-based, the block layer enforces sector-aligned answers.
> 

Just making sure I'm clear that we expect this function to do so;
clearly we're (You're*) removing sector assumptions from as many places
as we (You) can.

Thanks for playing whack-a-mole with stupid questions, I think I'm
on-board now.

--js

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based
  2017-04-11 22:29 ` [Qemu-devel] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based Eric Blake
@ 2017-04-24 23:06   ` John Snow
  2017-04-25  1:48     ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: John Snow @ 2017-04-24 23:06 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: kwolf, Changlong Xie, Fam Zheng, Wen Congyang, qemu-block,
	Max Reitz, Stefan Hajnoczi



On 04/11/2017 06:29 PM, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the signature of the function to use int64_t *pnum ensures
> that the compiler enforces that all callers are updated.  For now,
> the io.c layer still assert()s that all callers are sector-aligned,
> but that can be relaxed when a later patch implements byte-based
> block status.  Therefore, for the most part this patch is just the
> addition of scaling at the callers followed by inverse scaling at
> bdrv_is_allocated().  But some code, particularly stream_run(),
> gets a lot simpler because it no longer has to mess with sectors.
> 
> For ease of review, bdrv_is_allocated() was tackled separately.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  include/block/block.h |  2 +-
>  block/commit.c        | 20 ++++++++------------
>  block/io.c            | 36 +++++++++++++++---------------------
>  block/mirror.c        |  5 ++++-
>  block/replication.c   | 17 ++++++++++++-----
>  block/stream.c        | 21 +++++++++------------
>  qemu-img.c            | 10 +++++++---
>  7 files changed, 56 insertions(+), 55 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 8641149..740cb86 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -425,7 +425,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>  int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
>                        int64_t *pnum);
>  int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
> -                            int64_t sector_num, int nb_sectors, int *pnum);
> +                            int64_t offset, int64_t bytes, int64_t *pnum);
> 
>  bool bdrv_is_read_only(BlockDriverState *bs);
>  bool bdrv_is_sg(BlockDriverState *bs);
> diff --git a/block/commit.c b/block/commit.c
> index 4d6bb2a..989de7d 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -132,7 +132,7 @@ static void coroutine_fn commit_run(void *opaque)
>      int64_t offset;
>      uint64_t delay_ns = 0;
>      int ret = 0;
> -    int n = 0; /* sectors */
> +    int64_t n = 0; /* bytes */
>      void *buf = NULL;
>      int bytes_written = 0;
>      int64_t base_len;
> @@ -157,7 +157,7 @@ static void coroutine_fn commit_run(void *opaque)
> 
>      buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
> 
> -    for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
> +    for (offset = 0; offset < s->common.len; offset += n) {
>          bool copy;
> 
>          /* Note that even when no rate limit is applied we need to yield
> @@ -169,15 +169,12 @@ static void coroutine_fn commit_run(void *opaque)
>          }
>          /* Copy if allocated above the base */
>          ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
> -                                      offset / BDRV_SECTOR_SIZE,
> -                                      COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
> -                                      &n);
> +                                      offset, COMMIT_BUFFER_SIZE, &n);
>          copy = (ret == 1);
> -        trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
> +        trace_commit_one_iteration(s, offset, n, ret);
>          if (copy) {
> -            ret = commit_populate(s->top, s->base, offset,
> -                                  n * BDRV_SECTOR_SIZE, buf);
> -            bytes_written += n * BDRV_SECTOR_SIZE;
> +            ret = commit_populate(s->top, s->base, offset, n, buf);
> +            bytes_written += n;
>          }
>          if (ret < 0) {
>              BlockErrorAction action =
> @@ -190,11 +187,10 @@ static void coroutine_fn commit_run(void *opaque)
>              }
>          }
>          /* Publish progress */
> -        s->common.offset += n * BDRV_SECTOR_SIZE;
> +        s->common.offset += n;
> 
>          if (copy && s->common.speed) {
> -            delay_ns = ratelimit_calculate_delay(&s->limit,
> -                                                 n * BDRV_SECTOR_SIZE);
> +            delay_ns = ratelimit_calculate_delay(&s->limit, n);
>          }
>      }
> 
> diff --git a/block/io.c b/block/io.c
> index 438a493..9218329 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1930,52 +1930,46 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>  /*
>   * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
>   *
> - * Return true if the given sector is allocated in any image between
> + * Return true if the given offset is allocated in any image between

perhaps "range" instead of "offset"?

>   * BASE and TOP (inclusive).  BASE can be NULL to check if the given
> - * sector is allocated in any image of the chain.  Return false otherwise,
> + * offset is allocated in any image of the chain.  Return false otherwise,
>   * or negative errno on failure.
>   *
> - * 'pnum' is set to the number of sectors (including and immediately following
> - *  the specified sector) that are known to be in the same
> + * 'pnum' is set to the number of bytes (including and immediately following
> + *  the specified offset) that are known to be in the same
>   *  allocated/unallocated state.
>   *
>   */
>  int bdrv_is_allocated_above(BlockDriverState *top,
>                              BlockDriverState *base,
> -                            int64_t sector_num,
> -                            int nb_sectors, int *pnum)
> +                            int64_t offset, int64_t bytes, int64_t *pnum)
>  {
>      BlockDriverState *intermediate;
> -    int ret, n = nb_sectors;
> +    int ret;
> +    int64_t n = bytes;
> 
>      intermediate = top;
>      while (intermediate && intermediate != base) {
>          int64_t pnum_inter;
> -        int psectors_inter;
> 
> -        ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
> -                                nb_sectors * BDRV_SECTOR_SIZE,
> -                                &pnum_inter);
> +        ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
>          if (ret < 0) {
>              return ret;
>          }
> -        assert(pnum_inter < INT_MAX * BDRV_SECTOR_SIZE);
> -        psectors_inter = pnum_inter >> BDRV_SECTOR_BITS;
>          if (ret) {
> -            *pnum = psectors_inter;
> +            *pnum = pnum_inter;
>              return 1;
>          }
> 
>          /*
> -         * [sector_num, nb_sectors] is unallocated on top but intermediate
> -         * might have
> -         *
> -         * [sector_num+x, nr_sectors] allocated.
> +         * [offset, bytes] is unallocated on top but intermediate
> +         * might have [offset+x, bytes-x] allocated.
>           */
> -        if (n > psectors_inter &&
> +        if (n > pnum_inter &&
>              (intermediate == top ||
> -             sector_num + psectors_inter < intermediate->total_sectors)) {



> -            n = psectors_inter;
> +             offset + pnum_inter < (intermediate->total_sectors *
> +                                    BDRV_SECTOR_SIZE))) {

Naive question: not worth using either bdrv_getlength for bytes, or the
bdrv_nb_sectors helpers?

(Not sure if this is appropriate due to the variable length calls which
might disqualify it from internal usage)

> +            n = pnum_inter;
>          }
> 
>          intermediate = backing_bs(intermediate);
> diff --git a/block/mirror.c b/block/mirror.c
> index 8de0492..c92335a 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -609,6 +609,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
>      BlockDriverState *bs = s->source;
>      BlockDriverState *target_bs = blk_bs(s->target);
>      int ret, n;
> +    int64_t count;
> 
>      end = s->bdev_length / BDRV_SECTOR_SIZE;
> 
> @@ -658,11 +659,13 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
>              return 0;
>          }
> 
> -        ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, &n);
> +        ret = bdrv_is_allocated_above(bs, base, sector_num * BDRV_SECTOR_SIZE,
> +                                      nb_sectors * BDRV_SECTOR_SIZE, &count);
>          if (ret < 0) {
>              return ret;
>          }
> 
> +        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>          assert(n > 0);
>          if (ret == 1) {
>              bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
> diff --git a/block/replication.c b/block/replication.c
> index 414ecc4..605d90f 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -264,7 +264,8 @@ static coroutine_fn int replication_co_writev(BlockDriverState *bs,
>      BdrvChild *top = bs->file;
>      BdrvChild *base = s->secondary_disk;
>      BdrvChild *target;
> -    int ret, n;
> +    int ret;
> +    int64_t n;
> 
>      ret = replication_get_io_status(s);
>      if (ret < 0) {
> @@ -283,14 +284,20 @@ static coroutine_fn int replication_co_writev(BlockDriverState *bs,
>       */
>      qemu_iovec_init(&hd_qiov, qiov->niov);
>      while (remaining_sectors > 0) {
> -        ret = bdrv_is_allocated_above(top->bs, base->bs, sector_num,
> -                                      remaining_sectors, &n);
> +        int64_t count;
> +
> +        ret = bdrv_is_allocated_above(top->bs, base->bs,
> +                                      sector_num * BDRV_SECTOR_SIZE,
> +                                      remaining_sectors * BDRV_SECTOR_SIZE,
> +                                      &count);
>          if (ret < 0) {
>              goto out1;
>          }
> 
> +        assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
> +        n = count >> BDRV_SECTOR_BITS;
>          qemu_iovec_reset(&hd_qiov);
> -        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * BDRV_SECTOR_SIZE);
> +        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, count);
> 
>          target = ret ? top : base;
>          ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
> @@ -300,7 +307,7 @@ static coroutine_fn int replication_co_writev(BlockDriverState *bs,
> 
>          remaining_sectors -= n;
>          sector_num += n;
> -        bytes_done += n * BDRV_SECTOR_SIZE;
> +        bytes_done += count;
>      }
> 
>  out1:
> diff --git a/block/stream.c b/block/stream.c
> index 85502eb..9033655 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -112,7 +112,7 @@ static void coroutine_fn stream_run(void *opaque)
>      uint64_t delay_ns = 0;
>      int error = 0;
>      int ret = 0;
> -    int n = 0; /* sectors */
> +    int64_t n = 0; /* bytes */
>      void *buf;
> 
>      if (!bs->backing) {
> @@ -136,9 +136,8 @@ static void coroutine_fn stream_run(void *opaque)
>          bdrv_enable_copy_on_read(bs);
>      }
> 
> -    for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
> +    for ( ; offset < s->common.len; offset += n) {
>          bool copy;
> -        int64_t count = 0;
> 
>          /* Note that even when no rate limit is applied we need to yield
>           * with no pending I/O here so that bdrv_drain_all() returns.
> @@ -150,26 +149,25 @@ static void coroutine_fn stream_run(void *opaque)
> 
>          copy = false;
> 
> -        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count);
> -        n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
> +        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &n);
>          if (ret == 1) {
>              /* Allocated in the top, no need to copy.  */
>          } else if (ret >= 0) {
>              /* Copy if allocated in the intermediate images.  Limit to the
>               * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
>              ret = bdrv_is_allocated_above(backing_bs(bs), base,
> -                                          offset / BDRV_SECTOR_SIZE, n, &n);
> +                                          offset, n, &n);
> 
>              /* Finish early if end of backing file has been reached */
>              if (ret == 0 && n == 0) {
> -                n = (s->common.len - offset) / BDRV_SECTOR_SIZE;
> +                n = s->common.len - offset;
>              }
> 
>              copy = (ret == 1);
>          }
> -        trace_stream_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
> +        trace_stream_one_iteration(s, offset, n, ret);
>          if (copy) {
> -            ret = stream_populate(blk, offset, n * BDRV_SECTOR_SIZE, buf);
> +            ret = stream_populate(blk, offset, n, buf);
>          }
>          if (ret < 0) {
>              BlockErrorAction action =
> @@ -188,10 +186,9 @@ static void coroutine_fn stream_run(void *opaque)
>          ret = 0;
> 
>          /* Publish progress */
> -        s->common.offset += n * BDRV_SECTOR_SIZE;
> +        s->common.offset += n;
>          if (copy && s->common.speed) {
> -            delay_ns = ratelimit_calculate_delay(&s->limit,
> -                                                 n * BDRV_SECTOR_SIZE);
> +            delay_ns = ratelimit_calculate_delay(&s->limit, n);
>          }
>      }
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 2f21d69..d96b4d6 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1448,12 +1448,16 @@ static int img_compare(int argc, char **argv)
>          }
> 
>          for (;;) {
> +            int64_t count;
> +
>              nb_sectors = sectors_to_process(total_sectors_over, sector_num);
>              if (nb_sectors <= 0) {
>                  break;
>              }
> -            ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL, sector_num,
> -                                          nb_sectors, &pnum);
> +            ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL,
> +                                          sector_num * BDRV_SECTOR_SIZE,
> +                                          nb_sectors * BDRV_SECTOR_SIZE,
> +                                          &count);
>              if (ret < 0) {
>                  ret = 3;
>                  error_report("Sector allocation test failed for %s",
> @@ -1461,7 +1465,7 @@ static int img_compare(int argc, char **argv)
>                  goto out;
> 
>              }
> -            nb_sectors = pnum;
> +            nb_sectors = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>              if (ret) {
>                  ret = check_empty_sectors(blk_over, sector_num, nb_sectors,
>                                            filename_over, buf1, quiet);
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based
  2017-04-24 23:06   ` [Qemu-devel] [Qemu-block] " John Snow
@ 2017-04-25  1:48     ` Eric Blake
  2017-05-10 15:42       ` Eric Blake
  2017-05-10 22:11       ` Eric Blake
  0 siblings, 2 replies; 38+ messages in thread
From: Eric Blake @ 2017-04-25  1:48 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Changlong Xie, Fam Zheng, Wen Congyang, qemu-block,
	Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2984 bytes --]

On 04/24/2017 06:06 PM, John Snow wrote:
> 
> 
> On 04/11/2017 06:29 PM, Eric Blake wrote:
>> We are gradually moving away from sector-based interfaces, towards
>> byte-based.  In the common case, allocation is unlikely to ever use
>> values that are not naturally sector-aligned, but it is possible
>> that byte-based values will let us be more precise about allocation
>> at the end of an unaligned file that can do byte-based access.
>>
>> Changing the signature of the function to use int64_t *pnum ensures
>> that the compiler enforces that all callers are updated.  For now,
>> the io.c layer still assert()s that all callers are sector-aligned,
>> but that can be relaxed when a later patch implements byte-based
>> block status.  Therefore, for the most part this patch is just the
>> addition of scaling at the callers followed by inverse scaling at
>> bdrv_is_allocated().  But some code, particularly stream_run(),
>> gets a lot simpler because it no longer has to mess with sectors.
>>

>> +++ b/block/io.c
>> @@ -1930,52 +1930,46 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>>  /*
>>   * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
>>   *
>> - * Return true if the given sector is allocated in any image between
>> + * Return true if the given offset is allocated in any image between
> 
> perhaps "range" instead of "offset"?
> 
>>   * BASE and TOP (inclusive).  BASE can be NULL to check if the given
>> - * sector is allocated in any image of the chain.  Return false otherwise,
>> + * offset is allocated in any image of the chain.  Return false otherwise,
>>   * or negative errno on failure.

Seems reasonable.


>>          /*
>> -         * [sector_num, nb_sectors] is unallocated on top but intermediate
>> -         * might have
>> -         *
>> -         * [sector_num+x, nr_sectors] allocated.
>> +         * [offset, bytes] is unallocated on top but intermediate
>> +         * might have [offset+x, bytes-x] allocated.
>>           */
>> -        if (n > psectors_inter &&
>> +        if (n > pnum_inter &&
>>              (intermediate == top ||
>> -             sector_num + psectors_inter < intermediate->total_sectors)) {
> 
> 
> 
>> -            n = psectors_inter;
>> +             offset + pnum_inter < (intermediate->total_sectors *
>> +                                    BDRV_SECTOR_SIZE))) {
> 
> Naive question: not worth using either bdrv_getlength for bytes, or the
> bdrv_nb_sectors helpers?

bdrv_getlength(intermediate) should indeed be the same as
intermediate->total_sectors * BDRV_SECTOR_SIZE (for now - ultimately it
would be nice to track a byte length rather than a sector length for
images). I can make that cleanup for v2.

> 
> Reviewed-by: John Snow <jsnow@redhat.com>
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based
  2017-04-25  1:48     ` Eric Blake
@ 2017-05-10 15:42       ` Eric Blake
  2017-05-10 22:11       ` Eric Blake
  1 sibling, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-05-10 15:42 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Changlong Xie, Fam Zheng, Wen Congyang, qemu-block,
	Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 6019 bytes --]

On 04/24/2017 08:48 PM, Eric Blake wrote:
> On 04/24/2017 06:06 PM, John Snow wrote:
>>
>>
>> On 04/11/2017 06:29 PM, Eric Blake wrote:
>>> We are gradually moving away from sector-based interfaces, towards
>>> byte-based.  In the common case, allocation is unlikely to ever use
>>> values that are not naturally sector-aligned, but it is possible
>>> that byte-based values will let us be more precise about allocation
>>> at the end of an unaligned file that can do byte-based access.
>>>
>>> Changing the signature of the function to use int64_t *pnum ensures
>>> that the compiler enforces that all callers are updated.  For now,
>>> the io.c layer still assert()s that all callers are sector-aligned,
>>> but that can be relaxed when a later patch implements byte-based
>>> block status.  Therefore, for the most part this patch is just the
>>> addition of scaling at the callers followed by inverse scaling at
>>> bdrv_is_allocated().  But some code, particularly stream_run(),
>>> gets a lot simpler because it no longer has to mess with sectors.
>>>
> 
>>> +++ b/block/io.c
>>> @@ -1930,52 +1930,46 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>>>  /*
>>>   * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
>>>   *
>>> - * Return true if the given sector is allocated in any image between
>>> + * Return true if the given offset is allocated in any image between
>>
>> perhaps "range" instead of "offset"?
>>
>>>   * BASE and TOP (inclusive).  BASE can be NULL to check if the given
>>> - * sector is allocated in any image of the chain.  Return false otherwise,
>>> + * offset is allocated in any image of the chain.  Return false otherwise,
>>>   * or negative errno on failure.
> 
> Seems reasonable.

Actually, not quite. Suppose we have the following sector allocations:

backing: -- -- XX --
active:  -- -- -- XX

bdrv_is_allocated_above(active, NULL, 0, 2048, &num)

will return 0 with num set to 1024 (offset is not allocated, and the
not-allocated range is at least 1024 bytes).  But calling

bdr_is_allocated_above(backing, NULL, 1024, 1024, &num)

will return 1 with num set to 512 (the entire range is not allocated,
but the 512-byte prefix of the range IS allocated).  Meanwhile, note that:

bdrv_is_allocated_above(active, NULL, 1024, 1024, &num)

will ALSO return 1 with num set to 512, even though it would be nicer if
it could return 1024 (from the active layer, all 1024 bytes of the given
range ARE allocated, just not from the same location).  So callers have
to manually coalesce multiple bdrv_is_allocated_above() calls to get a
full picture of what is allocated.

The same is ALSO true if you have a fragmented image.  For example:

$ qemu-img create -f qcow2 -o cluster_size=1m file3 10m
$ qemu-io -f qcow2 -c 'w 0 5m' -c 'discard 0 2m' -c 'w 1m 1m' \
  -c 'w 0 1m' -c 'w 8m 2m' file3

The image is now fragmented (the clusters at 0 and 1m swapped mappings):

[{ "start": 0, "length": 1048576, "depth": 0, "zero": false, "data":
true, "offset": 6291456},
{ "start": 1048576, "length": 1048576, "depth": 0, "zero": false,
"data": true, "offset": 5242880},
{ "start": 2097152, "length": 3145728, "depth": 0, "zero": false,
"data": true, "offset": 7340032},
{ "start": 5242880, "length": 3145728, "depth": 0, "zero": true, "data":
false},
{ "start": 8388608, "length": 2097152, "depth": 0, "zero": false,
"data": true, "offset": 10485760}]


'qemu-io alloc' and 'qemu-io map' are callers which coalesce similar
results:

$ ./qemu-io -f qcow2 file3
qemu-io> alloc 0 10m
7340032/10485760 bytes allocated at offset 0 bytes
qemu-io> map
5 MiB (0x500000) bytes     allocated at offset 0 bytes (0x0)
3 MiB (0x300000) bytes not allocated at offset 5 MiB (0x500000)
2 MiB (0x200000) bytes     allocated at offset 8 MiB (0x800000)

although tracing under gdb showed 5 calls to bdrv_is_allocated() during
'alloc' (at [0, 10m], [1m, 9m], [2m, 8m], [5m, 5m], [8m, 2m]), and 8
calls during 'map' (at [0, 10m], [1m, 9m], [2m, 8m], [5m, 5m], [5m, 5m],
[8m, 2m], [8m, 2m], [10m, 0]).  Part of that is that the qemu-io map
implementation is rather inefficient - it makes a wasted query for 0
bytes, and at every transition between allocated and unallocated it ends
up asking for status on the same offset a second time around, rather
than remembering what it already learned on the previous iteration.

It might be worth followup patches to improve the efficiency of qemu-io
map; but also to change semantics such that bdrv_is_allocated_above()
gives the largest answer possible, rather than making the callers
coalesce identical answers.  Part of that would include teaching
.bdrv_co_get_block_status() new semantics: if file==NULL, then don't
return BDRV_BLOCK_OFFSET_VALID, but merely grab as much
BDRV_BLOCK_DATA/BDRV_BLOCK_ZERO information as possible (in spite of
fragmentation) [in the example above, it would mean returning
BDRV_BLOCK_DATA for 5m, rather than three separate returns of 1m/1m/3m];
as well as teaching bdrv_is_allocated_above() to concatenate all answers
along the backing chain until it reaches an actual change in status.
Looks like I've just added more work to my queue.

That said, there still has to be coalescing somewhere.  For example,
qcow2 images can easily return status for any range covered by a single
L2 cluster, but intentionally quits its response at the end of an L2
cluster because the cost of loading up the next cluster (with the
potential of dropping locks and racing with other threads doing writes)
becomes too hard to control within the qcow2 layer, and callers may need
to realize that the larger a request is on an actively-changing image,
the more likely the overall response can be inaccurate by the time
multiple sub-queries have been coalesced.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based
  2017-04-25  1:48     ` Eric Blake
  2017-05-10 15:42       ` Eric Blake
@ 2017-05-10 22:11       ` Eric Blake
  1 sibling, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-05-10 22:11 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Changlong Xie, Fam Zheng, Wen Congyang, qemu-block,
	Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1502 bytes --]

On 04/24/2017 08:48 PM, Eric Blake wrote:

>>
>>
>>> -            n = psectors_inter;
>>> +             offset + pnum_inter < (intermediate->total_sectors *
>>> +                                    BDRV_SECTOR_SIZE))) {
>>
>> Naive question: not worth using either bdrv_getlength for bytes, or the
>> bdrv_nb_sectors helpers?
> 
> bdrv_getlength(intermediate) should indeed be the same as
> intermediate->total_sectors * BDRV_SECTOR_SIZE (for now - ultimately it
> would be nice to track a byte length rather than a sector length for
> images). I can make that cleanup for v2.

This one's tricky.  Calling bdrv_nb_sectors()/bdrv_getlength() (the two
are identical other than scale) guarantees that you have a sane answer
for a variably-sized BDS, but can fail with -ENOMEDIUM.  Which, given
the position in the 'if' clause, makes it a difficult rewrite to
properly catch.  On the other hand, since we just barely called
bdrv_is_allocated(intermediate), which in turn called
bdrv_co_get_block_status(), and that calls bdrv_nb_sectors(), we are
assured that intermediate->total_sectors has not changed in the meantime.

So my options are to either add a big comment stating why we are safe,
or to use bdrv_getlength() anyways but with proper error checking.  Best
done as a separate patch from the conversion in scale; so I'll do that
for v2.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based
  2017-04-19 21:40       ` John Snow
@ 2017-05-10 22:33         ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2017-05-10 22:33 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, Fam Zheng, qemu-block, Juan Quintela,
	Dr. David Alan Gilbert, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 4540 bytes --]

On 04/19/2017 04:40 PM, John Snow wrote:
> 
> 
> On 04/19/2017 05:12 PM, Eric Blake wrote:
>> On 04/19/2017 03:32 PM, John Snow wrote:
>>
>>>> @@ -279,9 +280,9 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
>>>>          /* Skip unallocated sectors; intentionally treats failure as
>>>>           * an allocated sector */
>>>>          while (cur_sector < total_sectors &&
>>>> -               !bdrv_is_allocated(blk_bs(bb), cur_sector,
>>>> -                                  MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
>>>> -            cur_sector += nr_sectors;
>>>> +               !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
>>>> +                                  MAX_IS_ALLOCATED_SEARCH, &count)) {
>>>> +            cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>>>>          }
>>>
>>> Let's try asking again with the right vocabulary:
>>>
>>> OK, so I guess the idea is that we definitely shouldn't ever have a
>>> partially allocated sector; so I suppose this ROUND_UP is just a precaution?
>>
>> For now, yes - but the part 3 series (bdrv_get_block_status) makes
>> returning a sub-sector value possible.  For most drivers, it will only
>> happen at end-of-file (as status doesn't change mid-sector unless your
>> file is not a multiple of a sector size) - in fact, I found that I had
>> to patch the block layer to round up and sub-sector return that
>> coincides with end-of-file back to a round sector amount in order to
>> avoid breaking other code that assumed everything is allocated in sectors.
> 
> I'm sorry, I'm having difficulty parsing your last sentence.

We should NEVER have an allocation smaller than
bds->bl.request_alignment (for example, if you can't write in chunks
smaller than 512, then how can your allocation be smaller than that?).
But when we eventually have byte-granularity support, and for a device
where request_alignment is 1, it is feasible that we could track
mid-sector alignment changes.  In practice, we DO have an
easy-to-observe mid-sector alignment change: on a regular file not
aligned to sector boundaries, where request_alignment IS 1, we get a
mid-sector transition from data to hole at EOF.  So it turned out to be
easier in my later series on bdrv_get_block_status() to intentionally
round a partial sector at end-of-file up as if the entire sector was
allocated, rather than reporting the mid-sector alignment (matching the
fact that bdrv_getlength() rounds up to sector boundaries).

If, someday, we fix bdrv_getlength() to report a non-sector-aligned
value, we'll have to revisit the rounding (again, it feels like
bds->bl.request_alignment is the obvious place to ensure that we never
report a mid-alignment result, but where the alignment size is now
driver-dependent instead of hard-coded to 512).  In fact, even with
non-sector-aligned bdrv_getlength(), we may STILL be able to enforce
rules such as mid-sector transitions from hole->data are not possible
(regardless of request_alignment), and mid-sector transitions from
data->hole are possible only at the end of the file.

> I agree with you that there is absolutely definitely no real problem
> here right now, but I do secretly wonder if we'll invent a way to screw
> ourselves over.
> 
> Presumably it will get converted at SOME point like everything else, and
> it won't be a concern anymore.
> 
> Presumably Presumably that will even happen before we invent a way to
> screw ourselves over.

Here's hoping we are careful enough, and have enough asserts in the
right place so that if we guessed wrong we get a nice assert (rather
than risking inf-looping because we rounded a sub-sector down to 0, or
causing data corruption that doesn't show up until much later where we
rounded up and missed copying something); at the same time, that the
asserts are accurate enough that we can't trip them from legitimate users.

One thing I plan to add to my v2 posting (for the part 3
bdrv_get_block_status series) is to remove the 'qemu-io alloc' sector
alignment restrictions, to have a tool that makes it easier to prove
that I am properly handling rounding at the block layer out to
request_alignment limits without triggering any asserts.  And my recent
blkdebug enhancements are probably going to be helpful in validating
things, even if I have to add more iotests.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2017-05-10 22:34 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 22:29 [Qemu-devel] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 01/17] blockjob: Track job ratelimits via bytes, not sectors Eric Blake
2017-04-17 19:18   ` [Qemu-devel] [Qemu-block] " John Snow
2017-04-17 19:51     ` Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 02/17] trace: Show blockjob actions " Eric Blake
2017-04-17 19:18   ` [Qemu-devel] [Qemu-block] " John Snow
2017-04-17 19:55     ` Eric Blake
2017-04-19  9:12       ` Stefan Hajnoczi
2017-04-11 22:29 ` [Qemu-devel] [PATCH 03/17] stream: Switch stream_populate() to byte-based Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 04/17] stream: Switch stream_run() " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 05/17] commit: Switch commit_populate() " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 06/17] commit: Switch commit_run() " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 07/17] mirror: Switch MirrorBlockJob " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 08/17] mirror: Switch mirror_do_zero_or_discard() " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 09/17] mirror: Switch mirror_cow_align() " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 10/17] mirror: Switch mirror_do_read() " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 11/17] mirror: Switch mirror_iteration() " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 12/17] backup: Switch BackupBlockJob " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 13/17] backup: Switch block_backup.h " Eric Blake
2017-04-17 23:24   ` [Qemu-devel] [Qemu-block] " John Snow
2017-04-18  0:54     ` Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 14/17] backup: Switch backup_do_cow() " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 15/17] backup: Switch backup_run() " Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 16/17] block: Make bdrv_is_allocated() byte-based Eric Blake
2017-04-18 22:15   ` [Qemu-devel] [Qemu-block] " John Snow
2017-04-19 17:54     ` Eric Blake
2017-04-19 19:37       ` John Snow
2017-04-19 20:32   ` John Snow
2017-04-19 21:12     ` Eric Blake
2017-04-19 21:40       ` John Snow
2017-05-10 22:33         ` Eric Blake
2017-04-11 22:29 ` [Qemu-devel] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based Eric Blake
2017-04-24 23:06   ` [Qemu-devel] [Qemu-block] " John Snow
2017-04-25  1:48     ` Eric Blake
2017-05-10 15:42       ` Eric Blake
2017-05-10 22:11       ` Eric Blake
2017-04-17 23:42 ` [Qemu-devel] [Qemu-block] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based John Snow
2017-04-18  1:04   ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.