All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes
@ 2016-05-24 22:25 Eric Blake
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 01/13] block: Rename blk_write_zeroes() Eric Blake
                   ` (14 more replies)
  0 siblings, 15 replies; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf

Kevin pointed out that my recent change to byte-based instead
of sector-based blk_write_zeroes() (commit 983a1600) makes life
harder as long as bdrv_write_zeroes is still sector-based, and
where the compiler doesn't flag any change in parameter types.
Complete the conversion, by renaming things (so the compiler
will help flag any future rebase conflicts), and making all
write_zeroes operations nominally take bytes.

Definitely conflicts with Denis' qcow2_co_write_zeroes improvements
series, and probably with Kevin's conversion of block jobs to
BlockBackend. I can rebase if those land on the block branch first.

Eric Blake (13):
  block: Rename blk_write_zeroes()
  block: Track write zero limits in bytes
  block: Add .bdrv_co_pwrite_zeroes()
  block: Switch bdrv_write_zeroes() to byte interface
  iscsi: Convert to bdrv_co_pwrite_zeroes()
  qcow2: Convert to bdrv_co_pwrite_zeroes()
  blkreplay: Convert to bdrv_co_pwrite_zeroes()
  gluster: Convert to bdrv_co_pwrite_zeroes()
  qed: Convert to bdrv_co_pwrite_zeroes()
  raw-posix: Convert to bdrv_co_pwrite_zeroes()
  raw_bsd: Convert to bdrv_co_pwrite_zeroes()
  vmdk: Convert to bdrv_co_pwrite_zeroes()
  block: Kill bdrv_co_write_zeroes()

 include/block/block.h          |  16 +++----
 include/block/block_int.h      |  14 +++---
 include/sysemu/block-backend.h |  14 +++---
 block/backup.c                 |   7 +--
 block/blkreplay.c              |   8 ++--
 block/block-backend.c          |  14 +++---
 block/gluster.c                |  15 +++---
 block/io.c                     | 106 ++++++++++++++++++++++-------------------
 block/iscsi.c                  |  59 +++++++++++++----------
 block/mirror.c                 |   7 +--
 block/parallels.c              |   8 ++--
 block/qcow2-cluster.c          |   3 +-
 block/qcow2.c                  |  57 +++++++++++-----------
 block/qed.c                    |  27 ++++++-----
 block/raw-posix.c              |  37 ++++++--------
 block/raw_bsd.c                |  10 ++--
 block/vmdk.c                   |  19 ++++----
 hw/scsi/scsi-disk.c            |   2 +-
 migration/block.c              |   5 +-
 qemu-img.c                     |   4 +-
 qemu-io-cmds.c                 |  22 ++++-----
 tests/qemu-iotests/034         |   2 +-
 tests/qemu-iotests/154         |   2 +-
 trace-events                   |   6 +--
 24 files changed, 238 insertions(+), 226 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 01/13] block: Rename blk_write_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 02/13] block: Track write zero limits in bytes Eric Blake
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Max Reitz, Stefan Hajnoczi, Denis V. Lunev,
	Paolo Bonzini

Commit 983a1600 changed the semantics of blk_write_zeroes() to
be byte-based rather than sector-based, but did not change the
name, which is an open invitation for other code to misuse the
function.  Renaming to pwrite_zeroes() makes it more in line
with other byte-based interfaces, and will help make it easier
to track which remaining write_zeroes interfaces still need
conversion.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/sysemu/block-backend.h | 14 +++++++-------
 block/block-backend.c          | 14 +++++++-------
 block/parallels.c              |  4 ++--
 hw/scsi/scsi-disk.c            |  2 +-
 qemu-img.c                     |  4 ++--
 qemu-io-cmds.c                 | 22 +++++++++++-----------
 6 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 9d6615c..9571726 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -113,11 +113,11 @@ void *blk_get_attached_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
                           int count);
-int blk_write_zeroes(BlockBackend *blk, int64_t offset,
-                     int count, BdrvRequestFlags flags);
-BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t offset,
-                                 int count, BdrvRequestFlags flags,
-                                 BlockCompletionFunc *cb, void *opaque);
+int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+                      int count, BdrvRequestFlags flags);
+BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+                                  int count, BdrvRequestFlags flags,
+                                  BlockCompletionFunc *cb, void *opaque);
 int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count);
 int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count,
                BdrvRequestFlags flags);
@@ -195,8 +195,8 @@ int blk_get_open_flags_from_root_state(BlockBackend *blk);

 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
                   BlockCompletionFunc *cb, void *opaque);
-int coroutine_fn blk_co_write_zeroes(BlockBackend *blk, int64_t offset,
-                                     int count, BdrvRequestFlags flags);
+int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+                                      int count, BdrvRequestFlags flags);
 int blk_write_compressed(BlockBackend *blk, int64_t sector_num,
                          const uint8_t *buf, int nb_sectors);
 int blk_truncate(BlockBackend *blk, int64_t offset);
diff --git a/block/block-backend.c b/block/block-backend.c
index 4e8298b..b33b8e2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -857,8 +857,8 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
     return ret;
 }

-int blk_write_zeroes(BlockBackend *blk, int64_t offset,
-                     int count, BdrvRequestFlags flags)
+int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+                      int count, BdrvRequestFlags flags)
 {
     return blk_prw(blk, offset, NULL, count, blk_write_entry,
                    flags | BDRV_REQ_ZERO_WRITE);
@@ -973,9 +973,9 @@ static void blk_aio_write_entry(void *opaque)
     blk_aio_complete(acb);
 }

-BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t offset,
-                                 int count, BdrvRequestFlags flags,
-                                 BlockCompletionFunc *cb, void *opaque)
+BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+                                  int count, BdrvRequestFlags flags,
+                                  BlockCompletionFunc *cb, void *opaque)
 {
     return blk_aio_prwv(blk, offset, count, NULL, blk_aio_write_entry,
                         flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
@@ -1464,8 +1464,8 @@ void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
     return qemu_aio_get(aiocb_info, blk_bs(blk), cb, opaque);
 }

-int coroutine_fn blk_co_write_zeroes(BlockBackend *blk, int64_t offset,
-                                     int count, BdrvRequestFlags flags)
+int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+                                      int count, BdrvRequestFlags flags)
 {
     return blk_co_pwritev(blk, offset, count, NULL,
                           flags | BDRV_REQ_ZERO_WRITE);
diff --git a/block/parallels.c b/block/parallels.c
index 88cface..99fc0f7 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -517,8 +517,8 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
     if (ret < 0) {
         goto exit;
     }
-    ret = blk_write_zeroes(file, BDRV_SECTOR_SIZE,
-                           (bat_sectors - 1) << BDRV_SECTOR_BITS, 0);
+    ret = blk_pwrite_zeroes(file, BDRV_SECTOR_SIZE,
+                            (bat_sectors - 1) << BDRV_SECTOR_BITS, 0);
     if (ret < 0) {
         goto exit;
     }
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index ce89c98..a3ecad4 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1778,7 +1778,7 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
         block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
                          nb_sectors * s->qdev.blocksize,
                         BLOCK_ACCT_WRITE);
-        r->req.aiocb = blk_aio_write_zeroes(s->qdev.conf.blk,
+        r->req.aiocb = blk_aio_pwrite_zeroes(s->qdev.conf.blk,
                                 r->req.cmd.lba * s->qdev.blocksize,
                                 nb_sectors * s->qdev.blocksize,
                                 flags, scsi_aio_complete, r);
diff --git a/qemu-img.c b/qemu-img.c
index 4792366..35841d6 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1606,8 +1606,8 @@ static int convert_write(ImgConvertState *s, int64_t sector_num, int nb_sectors,
             if (s->has_zero_init) {
                 break;
             }
-            ret = blk_write_zeroes(s->target, sector_num << BDRV_SECTOR_BITS,
-                                   n << BDRV_SECTOR_BITS, 0);
+            ret = blk_pwrite_zeroes(s->target, sector_num << BDRV_SECTOR_BITS,
+                                    n << BDRV_SECTOR_BITS, 0);
             if (ret < 0) {
                 return ret;
             }
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index e766791..09e879f 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -451,12 +451,12 @@ typedef struct {
     bool done;
 } CoWriteZeroes;

-static void coroutine_fn co_write_zeroes_entry(void *opaque)
+static void coroutine_fn co_pwrite_zeroes_entry(void *opaque)
 {
     CoWriteZeroes *data = opaque;

-    data->ret = blk_co_write_zeroes(data->blk, data->offset, data->count,
-                                    data->flags);
+    data->ret = blk_co_pwrite_zeroes(data->blk, data->offset, data->count,
+                                     data->flags);
     data->done = true;
     if (data->ret < 0) {
         *data->total = data->ret;
@@ -466,8 +466,8 @@ static void coroutine_fn co_write_zeroes_entry(void *opaque)
     *data->total = data->count;
 }

-static int do_co_write_zeroes(BlockBackend *blk, int64_t offset, int64_t count,
-                              int flags, int64_t *total)
+static int do_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+                               int64_t count, int flags, int64_t *total)
 {
     Coroutine *co;
     CoWriteZeroes data = {
@@ -483,7 +483,7 @@ static int do_co_write_zeroes(BlockBackend *blk, int64_t offset, int64_t count,
         return -ERANGE;
     }

-    co = qemu_coroutine_create(co_write_zeroes_entry);
+    co = qemu_coroutine_create(co_pwrite_zeroes_entry);
     qemu_coroutine_enter(co, &data);
     while (!data.done) {
         aio_poll(blk_get_aio_context(blk), true);
@@ -901,7 +901,7 @@ static void write_help(void)
 " -C, -- report statistics in a machine parsable format\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
 " -u, -- with -z, allow unmapping\n"
-" -z, -- write zeroes using blk_co_write_zeroes\n"
+" -z, -- write zeroes using blk_co_pwrite_zeroes\n"
 "\n");
 }

@@ -1033,7 +1033,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
     if (bflag) {
         cnt = do_save_vmstate(blk, buf, offset, count, &total);
     } else if (zflag) {
-        cnt = do_co_write_zeroes(blk, offset, count, flags, &total);
+        cnt = do_co_pwrite_zeroes(blk, offset, count, flags, &total);
     } else if (cflag) {
         cnt = do_write_compressed(blk, buf, offset, count, &total);
     } else {
@@ -1376,7 +1376,7 @@ static void aio_write_help(void)
 " -i, -- treat request as invalid, for exercising stats\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
 " -u, -- with -z, allow unmapping\n"
-" -z, -- write zeroes using blk_aio_write_zeroes\n"
+" -z, -- write zeroes using blk_aio_pwrite_zeroes\n"
 "\n");
 }

@@ -1475,8 +1475,8 @@ static int aio_write_f(BlockBackend *blk, int argc, char **argv)
         }

         ctx->qiov.size = count;
-        blk_aio_write_zeroes(blk, ctx->offset, count, flags, aio_write_done,
-                             ctx);
+        blk_aio_pwrite_zeroes(blk, ctx->offset, count, flags, aio_write_done,
+                              ctx);
     } else {
         nr_iov = argc - optind;
         ctx->buf = create_iovec(blk, &ctx->qiov, &argv[optind], nr_iov,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 02/13] block: Track write zero limits in bytes
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 01/13] block: Rename blk_write_zeroes() Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 10:30   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 03/13] block: Add .bdrv_co_pwrite_zeroes() Eric Blake
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Stefan Hajnoczi, Fam Zheng, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

Another step towards removing sector-based interfaces: convert
the maximum write and minimum alignment values from sectorss to
bytes.  Alignment is changed to 'int', since it makes no sense
to have an alignment larger than the maximum write.  Add an
assert that no one was trying to use sectors to get a write
zeroes larger than 2G.  Rename the variables to let the compiler
check that all users are converted to the new semantics.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/block_int.h |  8 ++++----
 block/io.c                | 27 +++++++++++++++------------
 block/iscsi.c             |  6 ++----
 block/qcow2.c             |  2 +-
 block/qed.c               |  2 +-
 block/vmdk.c              |  6 +++---
 6 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index b6f4755..4282ffd 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -328,11 +328,11 @@ typedef struct BlockLimits {
     /* optimal alignment for discard requests in sectors */
     int64_t discard_alignment;

-    /* maximum number of sectors that can zeroized at once */
-    int max_write_zeroes;
+    /* maximum number of bytes that can zeroized at once */
+    int max_pwrite_zeroes;

-    /* optimal alignment for write zeroes requests in sectors */
-    int64_t write_zeroes_alignment;
+    /* optimal alignment for write zeroes requests in bytes */
+    int pwrite_zeroes_alignment;

     /* optimal transfer length in sectors */
     int opt_transfer_length;
diff --git a/block/io.c b/block/io.c
index 2a2ff84..41b4e9d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1120,32 +1120,35 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     int ret = 0;
     bool need_flush = false;

-    int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_write_zeroes,
-                                        BDRV_REQUEST_MAX_SECTORS);
+    int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
+    int max_write_zeroes_sectors = max_write_zeroes >> BDRV_SECTOR_BITS;
+    int write_zeroes_sector_align =
+        bs->bl.pwrite_zeroes_alignment >> BDRV_SECTOR_BITS;

+    assert(nb_sectors <= BDRV_REQUEST_MAX_SECTORS);
     while (nb_sectors > 0 && !ret) {
         int num = nb_sectors;

         /* Align request.  Block drivers can expect the "bulk" of the request
          * to be aligned.
          */
-        if (bs->bl.write_zeroes_alignment
-            && num > bs->bl.write_zeroes_alignment) {
-            if (sector_num % bs->bl.write_zeroes_alignment != 0) {
+        if (write_zeroes_sector_align
+            && num > write_zeroes_sector_align) {
+            if (sector_num % write_zeroes_sector_align != 0) {
                 /* Make a small request up to the first aligned sector.  */
-                num = bs->bl.write_zeroes_alignment;
-                num -= sector_num % bs->bl.write_zeroes_alignment;
-            } else if ((sector_num + num) % bs->bl.write_zeroes_alignment != 0) {
+                num = write_zeroes_sector_align;
+                num -= sector_num % write_zeroes_sector_align;
+            } else if ((sector_num + num) % write_zeroes_sector_align != 0) {
                 /* Shorten the request to the last aligned sector.  num cannot
-                 * underflow because num > bs->bl.write_zeroes_alignment.
+                 * underflow because num > write_zeroes_sector_align.
                  */
-                num -= (sector_num + num) % bs->bl.write_zeroes_alignment;
+                num -= (sector_num + num) % write_zeroes_sector_align;
             }
         }

         /* limit request size */
-        if (num > max_write_zeroes) {
-            num = max_write_zeroes;
+        if (num > max_write_zeroes_sectors) {
+            num = max_write_zeroes_sectors;
         }

         ret = -ENOTSUP;
diff --git a/block/iscsi.c b/block/iscsi.c
index 10f3906..0acc3dc 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1706,12 +1706,10 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
     }

     if (iscsilun->bl.max_ws_len < 0xffffffff) {
-        bs->bl.max_write_zeroes =
-            sector_limits_lun2qemu(iscsilun->bl.max_ws_len, iscsilun);
+        bs->bl.max_pwrite_zeroes = iscsilun->bl.max_ws_len;
     }
     if (iscsilun->lbp.lbpws) {
-        bs->bl.write_zeroes_alignment =
-            sector_limits_lun2qemu(iscsilun->bl.opt_unmap_gran, iscsilun);
+        bs->bl.pwrite_zeroes_alignment = iscsilun->bl.opt_unmap_gran;
     }
     bs->bl.opt_transfer_length =
         sector_limits_lun2qemu(iscsilun->bl.opt_xfer_len, iscsilun);
diff --git a/block/qcow2.c b/block/qcow2.c
index c9306a7..745b66f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1193,7 +1193,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVQcow2State *s = bs->opaque;

-    bs->bl.write_zeroes_alignment = s->cluster_sectors;
+    bs->bl.pwrite_zeroes_alignment = s->cluster_sectors << BDRV_SECTOR_BITS;
 }

 static int qcow2_set_key(BlockDriverState *bs, const char *key)
diff --git a/block/qed.c b/block/qed.c
index b591d4a..0ab5b40 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -518,7 +518,7 @@ static void bdrv_qed_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVQEDState *s = bs->opaque;

-    bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
+    bs->bl.pwrite_zeroes_alignment = s->header.cluster_size;
 }

 /* We have nothing to do for QED reopen, stubs just return
diff --git a/block/vmdk.c b/block/vmdk.c
index 372e5ed..8494d63 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -998,9 +998,9 @@ static void vmdk_refresh_limits(BlockDriverState *bs, Error **errp)

     for (i = 0; i < s->num_extents; i++) {
         if (!s->extents[i].flat) {
-            bs->bl.write_zeroes_alignment =
-                MAX(bs->bl.write_zeroes_alignment,
-                    s->extents[i].cluster_sectors);
+            bs->bl.pwrite_zeroes_alignment =
+                MAX(bs->bl.pwrite_zeroes_alignment,
+                    s->extents[i].cluster_sectors << BDRV_SECTOR_BITS);
         }
     }
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 03/13] block: Add .bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 01/13] block: Rename blk_write_zeroes() Eric Blake
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 02/13] block: Track write zero limits in bytes Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 13:02   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 04/13] block: Switch bdrv_write_zeroes() to byte interface Eric Blake
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Stefan Hajnoczi, Fam Zheng, Max Reitz

Update bdrv_co_do_write_zeroes() to be byte-based, and select
between the new byte-based bdrv_co_pwrite_zeroes() or the old
bdrv_co_write_zeroes().  The next patches will convert drivers,
then remove the old interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/block_int.h |  4 ++-
 block/io.c                | 81 +++++++++++++++++++++++++----------------------
 2 files changed, 47 insertions(+), 38 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4282ffd..fa7e3f9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -165,6 +165,8 @@ struct BlockDriver {
      */
     int coroutine_fn (*bdrv_co_write_zeroes)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, BdrvRequestFlags flags);
+    int coroutine_fn (*bdrv_co_pwrite_zeroes)(BlockDriverState *bs,
+        int64_t offset, int count, BdrvRequestFlags flags);
     int coroutine_fn (*bdrv_co_discard)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors);
     int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
@@ -454,7 +456,7 @@ struct BlockDriverState {
     unsigned int request_alignment;
     /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
     unsigned int supported_write_flags;
-    /* Flags honored during write_zeroes (so far: BDRV_REQ_FUA,
+    /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
      * BDRV_REQ_MAY_UNMAP) */
     unsigned int supported_zero_flags;

diff --git a/block/io.c b/block/io.c
index 41b4e9d..c1d700b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -42,8 +42,8 @@ static BlockAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
                                          void *opaque,
                                          bool is_write);
 static void coroutine_fn bdrv_co_do_rw(void *opaque);
-static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, BdrvRequestFlags flags);
+static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
+    int64_t offset, int count, BdrvRequestFlags flags);

 static void bdrv_parent_drained_begin(BlockDriverState *bs)
 {
@@ -876,10 +876,12 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
         goto err;
     }

-    if (drv->bdrv_co_write_zeroes &&
+    if ((drv->bdrv_co_write_zeroes || drv->bdrv_co_pwrite_zeroes) &&
         buffer_is_zero(bounce_buffer, iov.iov_len)) {
-        ret = bdrv_co_do_write_zeroes(bs, cluster_sector_num,
-                                      cluster_nb_sectors, 0);
+        ret = bdrv_co_do_pwrite_zeroes(bs,
+                                       cluster_sector_num * BDRV_SECTOR_SIZE,
+                                       cluster_nb_sectors * BDRV_SECTOR_SIZE,
+                                       0);
     } else {
         /* This does not change the data on the disk, it is not necessary
          * to flush even in cache=writethrough mode.
@@ -1111,8 +1113,8 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,

 #define MAX_WRITE_ZEROES_BOUNCE_BUFFER 32768

-static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
+static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
+    int64_t offset, int count, BdrvRequestFlags flags)
 {
     BlockDriver *drv = bs->drv;
     QEMUIOVector qiov;
@@ -1121,40 +1123,45 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     bool need_flush = false;

     int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
-    int max_write_zeroes_sectors = max_write_zeroes >> BDRV_SECTOR_BITS;
-    int write_zeroes_sector_align =
-        bs->bl.pwrite_zeroes_alignment >> BDRV_SECTOR_BITS;
+    int alignment = MAX(bs->bl.pwrite_zeroes_alignment, BDRV_SECTOR_SIZE);

-    assert(nb_sectors <= BDRV_REQUEST_MAX_SECTORS);
-    while (nb_sectors > 0 && !ret) {
-        int num = nb_sectors;
+    while (count > 0 && !ret) {
+        int num = count;

         /* Align request.  Block drivers can expect the "bulk" of the request
          * to be aligned.
          */
-        if (write_zeroes_sector_align
-            && num > write_zeroes_sector_align) {
-            if (sector_num % write_zeroes_sector_align != 0) {
+        if (count > alignment) {
+            if (offset % alignment) {
                 /* Make a small request up to the first aligned sector.  */
-                num = write_zeroes_sector_align;
-                num -= sector_num % write_zeroes_sector_align;
-            } else if ((sector_num + num) % write_zeroes_sector_align != 0) {
+                num = alignment - (offset % alignment);
+            } else if ((offset + count) % alignment) {
                 /* Shorten the request to the last aligned sector.  num cannot
-                 * underflow because num > write_zeroes_sector_align.
+                 * underflow because num > alignment.
                  */
-                num -= (sector_num + num) % write_zeroes_sector_align;
+                num -= (offset + num) % alignment;
             }
         }

         /* limit request size */
-        if (num > max_write_zeroes_sectors) {
-            num = max_write_zeroes_sectors;
+        if (num > max_write_zeroes) {
+            num = max_write_zeroes;
         }

         ret = -ENOTSUP;
         /* First try the efficient write zeroes operation */
-        if (drv->bdrv_co_write_zeroes) {
-            ret = drv->bdrv_co_write_zeroes(bs, sector_num, num,
+        if (drv->bdrv_co_pwrite_zeroes) {
+            ret = drv->bdrv_co_pwrite_zeroes(bs, offset, num,
+                                             flags & bs->supported_zero_flags);
+            if (ret != -ENOTSUP && (flags & BDRV_REQ_FUA) &&
+                !(bs->supported_zero_flags & BDRV_REQ_FUA)) {
+                need_flush = true;
+            }
+        } else if (drv->bdrv_co_write_zeroes) {
+            assert(offset % BDRV_SECTOR_SIZE == 0);
+            assert(count % BDRV_SECTOR_SIZE == 0);
+            ret = drv->bdrv_co_write_zeroes(bs, offset >> BDRV_SECTOR_BITS,
+                                            num >> BDRV_SECTOR_BITS,
                                             flags & bs->supported_zero_flags);
             if (ret != -ENOTSUP && (flags & BDRV_REQ_FUA) &&
                 !(bs->supported_zero_flags & BDRV_REQ_FUA)) {
@@ -1177,33 +1184,31 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
                 write_flags &= ~BDRV_REQ_FUA;
                 need_flush = true;
             }
-            num = MIN(num, max_xfer_len);
-            iov.iov_len = num * BDRV_SECTOR_SIZE;
+            num = MIN(num, max_xfer_len << BDRV_SECTOR_BITS);
+            iov.iov_len = num;
             if (iov.iov_base == NULL) {
-                iov.iov_base = qemu_try_blockalign(bs, num * BDRV_SECTOR_SIZE);
+                iov.iov_base = qemu_try_blockalign(bs, num);
                 if (iov.iov_base == NULL) {
                     ret = -ENOMEM;
                     goto fail;
                 }
-                memset(iov.iov_base, 0, num * BDRV_SECTOR_SIZE);
+                memset(iov.iov_base, 0, num);
             }
             qemu_iovec_init_external(&qiov, &iov, 1);

-            ret = bdrv_driver_pwritev(bs, sector_num * BDRV_SECTOR_SIZE,
-                                      num * BDRV_SECTOR_SIZE, &qiov,
-                                      write_flags);
+            ret = bdrv_driver_pwritev(bs, offset, num, &qiov, write_flags);

             /* Keep bounce buffer around if it is big enough for all
              * all future requests.
              */
-            if (num < max_xfer_len) {
+            if (num < max_xfer_len << BDRV_SECTOR_BITS) {
                 qemu_vfree(iov.iov_base);
                 iov.iov_base = NULL;
             }
         }

-        sector_num += num;
-        nb_sectors -= num;
+        offset += num;
+        count -= num;
     }

 fail:
@@ -1241,7 +1246,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
     ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);

     if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
-        !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_write_zeroes &&
+        !(flags & BDRV_REQ_ZERO_WRITE) &&
+        (drv->bdrv_co_pwrite_zeroes || drv->bdrv_co_write_zeroes) &&
         qemu_iovec_is_zero(qiov)) {
         flags |= BDRV_REQ_ZERO_WRITE;
         if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
@@ -1253,7 +1259,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
         /* Do nothing, write notifier decided to fail this request */
     } else if (flags & BDRV_REQ_ZERO_WRITE) {
         bdrv_debug_event(bs, BLKDBG_PWRITEV_ZERO);
-        ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors, flags);
+        ret = bdrv_co_do_pwrite_zeroes(bs, sector_num << BDRV_SECTOR_BITS,
+                                       nb_sectors << BDRV_SECTOR_BITS, flags);
     } else {
         bdrv_debug_event(bs, BLKDBG_PWRITEV);
         ret = bdrv_driver_pwritev(bs, offset, bytes, qiov, flags);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 04/13] block: Switch bdrv_write_zeroes() to byte interface
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (2 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 03/13] block: Add .bdrv_co_pwrite_zeroes() Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 13:18   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 05/13] iscsi: Convert to bdrv_co_pwrite_zeroes() Eric Blake
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Jeff Cody, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Denis V. Lunev, Juan Quintela, Amit Shah

Rename to bdrv_pwrite_zeroes() to let the compiler ensure we
cater to the updated semantics.  Do the same for
bdrv_aio_write_zeroes() and bdrv_co_write_zeroes().  For now,
we still require sector alignment in the callers, via assertions.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/block.h  | 16 ++++++++--------
 block/backup.c         |  7 ++++---
 block/blkreplay.c      |  4 +++-
 block/io.c             | 39 +++++++++++++++++++++++----------------
 block/mirror.c         |  7 ++++---
 block/parallels.c      |  4 +++-
 block/qcow2-cluster.c  |  3 +--
 block/qcow2.c          |  9 ++++-----
 block/raw_bsd.c        |  3 ++-
 migration/block.c      |  5 +++--
 tests/qemu-iotests/034 |  2 +-
 tests/qemu-iotests/154 |  2 +-
 trace-events           |  4 ++--
 13 files changed, 59 insertions(+), 46 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index b740af8..da4d9b8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -33,7 +33,7 @@ typedef struct BlockDriverInfo {
      * True if the driver can optimize writing zeroes by unmapping
      * sectors. This is equivalent to the BLKDISCARDZEROES ioctl in Linux
      * with the difference that in qemu a discard is allowed to silently
-     * fail. Therefore we have to use bdrv_write_zeroes with the
+     * fail. Therefore we have to use bdrv_pwrite_zeroes with the
      * BDRV_REQ_MAY_UNMAP flag for an optimized zero write with unmapping.
      * After this call the driver has to guarantee that the contents read
      * back as zero. It is additionally required that the block device is
@@ -227,11 +227,11 @@ int bdrv_read(BlockDriverState *bs, int64_t sector_num,
               uint8_t *buf, int nb_sectors);
 int bdrv_write(BlockDriverState *bs, int64_t sector_num,
                const uint8_t *buf, int nb_sectors);
-int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
-               int nb_sectors, BdrvRequestFlags flags);
-BlockAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs, int64_t sector_num,
-                                  int nb_sectors, BdrvRequestFlags flags,
-                                  BlockCompletionFunc *cb, void *opaque);
+int bdrv_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+                       int count, BdrvRequestFlags flags);
+BlockAIOCB *bdrv_aio_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+                                   int count, BdrvRequestFlags flags,
+                                   BlockCompletionFunc *cb, void *opaque);
 int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags);
 int bdrv_pread(BlockDriverState *bs, int64_t offset,
                void *buf, int count);
@@ -254,8 +254,8 @@ int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
  * function is not suitable for zeroing the entire image in a single request
  * because it may allocate memory for the entire region.
  */
-int coroutine_fn bdrv_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
-    int nb_sectors, BdrvRequestFlags flags);
+int coroutine_fn bdrv_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+    int count, BdrvRequestFlags flags);
 BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     const char *backing_file);
 int bdrv_get_backing_file_depth(BlockDriverState *bs);
diff --git a/block/backup.c b/block/backup.c
index fec45e8..918ff4f 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -154,9 +154,10 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
         }

         if (buffer_is_zero(iov.iov_base, iov.iov_len)) {
-            ret = bdrv_co_write_zeroes(job->target,
-                                       start * sectors_per_cluster,
-                                       n, BDRV_REQ_MAY_UNMAP);
+            ret = bdrv_co_pwrite_zeroes(job->target,
+                                        start * job->cluster_size,
+                                        n << BDRV_SECTOR_BITS,
+                                        BDRV_REQ_MAY_UNMAP);
         } else {
             ret = bdrv_co_writev(job->target,
                                  start * sectors_per_cluster, n,
diff --git a/block/blkreplay.c b/block/blkreplay.c
index 42f1813..1a721ad 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -107,7 +107,9 @@ static int coroutine_fn blkreplay_co_write_zeroes(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
 {
     uint64_t reqid = request_id++;
-    int ret = bdrv_co_write_zeroes(bs->file->bs, sector_num, nb_sectors, flags);
+    int ret = bdrv_co_pwrite_zeroes(bs->file->bs,
+                                    sector_num << BDRV_SECTOR_BITS,
+                                    nb_sectors << BDRV_SECTOR_BITS, flags);
     block_request_create(reqid, bs, qemu_coroutine_self());
     qemu_coroutine_yield();

diff --git a/block/io.c b/block/io.c
index c1d700b..ea8135f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -603,18 +603,21 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
     return bdrv_rw_co(bs, sector_num, (uint8_t *)buf, nb_sectors, true, 0);
 }

-int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
-                      int nb_sectors, BdrvRequestFlags flags)
+int bdrv_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+                       int count, BdrvRequestFlags flags)
 {
-    return bdrv_rw_co(bs, sector_num, NULL, nb_sectors, true,
+    assert(offset % BDRV_SECTOR_SIZE == 0);
+    assert(count % BDRV_SECTOR_SIZE == 0);
+    return bdrv_rw_co(bs, offset >> BDRV_SECTOR_BITS, NULL,
+                      count >> BDRV_SECTOR_BITS, true,
                       BDRV_REQ_ZERO_WRITE | flags);
 }

 /*
- * Completely zero out a block device with the help of bdrv_write_zeroes.
+ * Completely zero out a block device with the help of bdrv_pwrite_zeroes.
  * The operation is sped up by checking the block status and only writing
  * zeroes to the device if they currently do not return zeroes. Optional
- * flags are passed through to bdrv_write_zeroes (e.g. BDRV_REQ_MAY_UNMAP,
+ * flags are passed through to bdrv_pwrite_zeroes (e.g. BDRV_REQ_MAY_UNMAP,
  * BDRV_REQ_FUA).
  *
  * Returns < 0 on error, 0 on success. For error codes see bdrv_write().
@@ -645,7 +648,8 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags)
             sector_num += n;
             continue;
         }
-        ret = bdrv_write_zeroes(bs, sector_num, n, flags);
+        ret = bdrv_pwrite_zeroes(bs, sector_num << BDRV_SECTOR_BITS,
+                                 n << BDRV_SECTOR_BITS, flags);
         if (ret < 0) {
             error_report("error writing zeroes at sector %" PRId64 ": %s",
                          sector_num, strerror(-ret));
@@ -1513,18 +1517,18 @@ int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
     return bdrv_co_do_writev(bs, sector_num, nb_sectors, qiov, 0);
 }

-int coroutine_fn bdrv_co_write_zeroes(BlockDriverState *bs,
-                                      int64_t sector_num, int nb_sectors,
-                                      BdrvRequestFlags flags)
+int coroutine_fn bdrv_co_pwrite_zeroes(BlockDriverState *bs,
+                                       int64_t offset, int count,
+                                       BdrvRequestFlags flags)
 {
-    trace_bdrv_co_write_zeroes(bs, sector_num, nb_sectors, flags);
+    trace_bdrv_co_pwrite_zeroes(bs, offset, count, flags);

     if (!(bs->open_flags & BDRV_O_UNMAP)) {
         flags &= ~BDRV_REQ_MAY_UNMAP;
     }

-    return bdrv_co_do_writev(bs, sector_num, nb_sectors, NULL,
-                             BDRV_REQ_ZERO_WRITE | flags);
+    return bdrv_co_pwritev(bs, offset, count, NULL,
+                           BDRV_REQ_ZERO_WRITE | flags);
 }

 typedef struct BdrvCoGetBlockStatusData {
@@ -1876,13 +1880,16 @@ BlockAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
                                  cb, opaque, true);
 }

-BlockAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs,
-        int64_t sector_num, int nb_sectors, BdrvRequestFlags flags,
+BlockAIOCB *bdrv_aio_pwrite_zeroes(BlockDriverState *bs,
+        int64_t offset, int count, BdrvRequestFlags flags,
         BlockCompletionFunc *cb, void *opaque)
 {
-    trace_bdrv_aio_write_zeroes(bs, sector_num, nb_sectors, flags, opaque);
+    trace_bdrv_aio_pwrite_zeroes(bs, offset, count, flags, opaque);
+    assert(offset % BDRV_SECTOR_SIZE == 0);
+    assert(count % BDRV_SECTOR_SIZE == 0);

-    return bdrv_co_aio_rw_vector(bs, sector_num, NULL, nb_sectors,
+    return bdrv_co_aio_rw_vector(bs, offset >> BDRV_SECTOR_BITS, NULL,
+                                 count >> BDRV_SECTOR_BITS,
                                  BDRV_REQ_ZERO_WRITE | flags,
                                  cb, opaque, true);
 }
diff --git a/block/mirror.c b/block/mirror.c
index b9986d8..152d276 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -299,9 +299,10 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
         bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
                          mirror_write_complete, op);
     } else {
-        bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
-                              s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
-                              mirror_write_complete, op);
+        bdrv_aio_pwrite_zeroes(s->target, sector_num << BDRV_SECTOR_BITS,
+                               op->nb_sectors << BDRV_SECTOR_BITS,
+                               s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
+                               mirror_write_complete, op);
     }
 }

diff --git a/block/parallels.c b/block/parallels.c
index 99fc0f7..4621553 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -210,7 +210,9 @@ static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
         int ret;
         space += s->prealloc_size;
         if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE) {
-            ret = bdrv_write_zeroes(bs->file->bs, s->data_end, space, 0);
+            ret = bdrv_pwrite_zeroes(bs->file->bs,
+                                     s->data_end << BDRV_SECTOR_BITS,
+                                     space << BDRV_SECTOR_BITS, 0);
         } else {
             ret = bdrv_truncate(bs->file->bs,
                                 (s->data_end + space) << BDRV_SECTOR_BITS);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 892e0fb..d901d89 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1765,8 +1765,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 goto fail;
             }

-            ret = bdrv_write_zeroes(bs->file->bs, offset / BDRV_SECTOR_SIZE,
-                                    s->cluster_sectors, 0);
+            ret = bdrv_pwrite_zeroes(bs->file->bs, offset, s->cluster_size, 0);
             if (ret < 0) {
                 if (!preallocated) {
                     qcow2_free_clusters(bs, offset, s->cluster_size,
diff --git a/block/qcow2.c b/block/qcow2.c
index 745b66f..978694e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2664,8 +2664,8 @@ static int make_completely_empty(BlockDriverState *bs)
     /* After this call, neither the in-memory nor the on-disk refcount
      * information accurately describe the actual references */

-    ret = bdrv_write_zeroes(bs->file->bs, s->l1_table_offset / BDRV_SECTOR_SIZE,
-                            l1_clusters * s->cluster_sectors, 0);
+    ret = bdrv_pwrite_zeroes(bs->file->bs, s->l1_table_offset,
+                             l1_clusters * s->cluster_size, 0);
     if (ret < 0) {
         goto fail_broken_refcounts;
     }
@@ -2678,9 +2678,8 @@ static int make_completely_empty(BlockDriverState *bs)
      * overwrite parts of the existing refcount and L1 table, which is not
      * an issue because the dirty flag is set, complete data loss is in fact
      * desired and partial data loss is consequently fine as well */
-    ret = bdrv_write_zeroes(bs->file->bs, s->cluster_size / BDRV_SECTOR_SIZE,
-                            (2 + l1_clusters) * s->cluster_size /
-                            BDRV_SECTOR_SIZE, 0);
+    ret = bdrv_pwrite_zeroes(bs->file->bs, s->cluster_size,
+                             (2 + l1_clusters) * s->cluster_size, 0);
     /* This call (even if it failed overall) may have overwritten on-disk
      * refcount structures; in that case, the in-memory refcount information
      * will probably differ from the on-disk information which makes the BDS
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 3385ed4..d9adf90 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -131,7 +131,8 @@ static int coroutine_fn raw_co_write_zeroes(BlockDriverState *bs,
                                             int64_t sector_num, int nb_sectors,
                                             BdrvRequestFlags flags)
 {
-    return bdrv_co_write_zeroes(bs->file->bs, sector_num, nb_sectors, flags);
+    return bdrv_co_pwrite_zeroes(bs->file->bs, sector_num << BDRV_SECTOR_BITS,
+                                 nb_sectors << BDRV_SECTOR_BITS, flags);
 }

 static int coroutine_fn raw_co_discard(BlockDriverState *bs,
diff --git a/migration/block.c b/migration/block.c
index e0628d1..16cc1f8 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -883,8 +883,9 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
             }

             if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
-                ret = bdrv_write_zeroes(bs, addr, nr_sectors,
-                                        BDRV_REQ_MAY_UNMAP);
+                ret = bdrv_pwrite_zeroes(bs, addr << BDRV_SECTOR_BITS,
+                                         nr_sectors << BDRV_SECTOR_BITS,
+                                         BDRV_REQ_MAY_UNMAP);
             } else {
                 buf = g_malloc(BLOCK_SIZE);
                 qemu_get_buffer(f, buf, BLOCK_SIZE);
diff --git a/tests/qemu-iotests/034 b/tests/qemu-iotests/034
index c711cfc..1b28bda 100755
--- a/tests/qemu-iotests/034
+++ b/tests/qemu-iotests/034
@@ -1,6 +1,6 @@
 #!/bin/bash
 #
-# Test bdrv_write_zeroes with backing files
+# Test bdrv_pwrite_zeroes with backing files (see also 154)
 #
 # Copyright (C) 2012 Red Hat, Inc.
 #
diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
index 23f1b3a..619e7b9 100755
--- a/tests/qemu-iotests/154
+++ b/tests/qemu-iotests/154
@@ -1,6 +1,6 @@
 #!/bin/bash
 #
-# qcow2 specific bdrv_write_zeroes tests with backing files (complements 034)
+# qcow2 specific bdrv_pwrite_zeroes tests with backing files (complements 034)
 #
 # Copyright (C) 2016 Red Hat, Inc.
 #
diff --git a/trace-events b/trace-events
index b53c354..97be1d7 100644
--- a/trace-events
+++ b/trace-events
@@ -66,12 +66,12 @@ bdrv_aio_discard(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs
 bdrv_aio_flush(void *bs, void *opaque) "bs %p opaque %p"
 bdrv_aio_readv(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
 bdrv_aio_writev(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
-bdrv_aio_write_zeroes(void *bs, int64_t sector_num, int nb_sectors, int flags, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d flags %#x opaque %p"
+bdrv_aio_pwrite_zeroes(void *bs, int64_t offset, int count, int flags, void *opaque) "bs %p offset %"PRId64" count %d flags %#x opaque %p"
 bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_readv_no_serialising(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
-bdrv_co_write_zeroes(void *bs, int64_t sector_num, int nb_sector, int flags) "bs %p sector_num %"PRId64" nb_sectors %d flags %#x"
+bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags %#x"
 bdrv_co_do_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t cluster_sector_num, int cluster_nb_sectors) "bs %p sector_num %"PRId64" nb_sectors %d cluster_sector_num %"PRId64" cluster_nb_sectors %d"

 # block/stream.c
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 05/13] iscsi: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (3 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 04/13] block: Switch bdrv_write_zeroes() to byte interface Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 13:34   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 06/13] qcow2: " Eric Blake
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Ronnie Sahlberg, Paolo Bonzini, Peter Lieven,
	Max Reitz

Another step on our continuing quest to switch to byte-based
interfaces.

As this is the first byte-based iscsi interface, convert
is_request_lun_aligned() into two versions, one for sectors
and one for bytes.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/iscsi.c | 53 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 0acc3dc..3dbfd57 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -401,18 +401,25 @@ static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
     return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
 }

-static bool is_request_lun_aligned(int64_t sector_num, int nb_sectors,
-                                      IscsiLun *iscsilun)
+static bool is_byte_request_lun_aligned(int64_t offset, int count,
+                                        IscsiLun *iscsilun)
 {
-    if ((sector_num * BDRV_SECTOR_SIZE) % iscsilun->block_size ||
-        (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size) {
-            error_report("iSCSI misaligned request: "
-                         "iscsilun->block_size %u, sector_num %" PRIi64
-                         ", nb_sectors %d",
-                         iscsilun->block_size, sector_num, nb_sectors);
-            return 0;
+    if (offset % iscsilun->block_size || count % iscsilun->block_size) {
+        error_report("iSCSI misaligned request: "
+                     "iscsilun->block_size %u, offset %" PRIi64
+                     ", count %d",
+                     iscsilun->block_size, offset, count);
+        return false;
     }
-    return 1;
+    return true;
+}
+
+static bool is_sector_request_lun_aligned(int64_t sector_num, int nb_sectors,
+                                          IscsiLun *iscsilun)
+{
+    return is_byte_request_lun_aligned(sector_num << BDRV_SECTOR_BITS,
+                                       nb_sectors << BDRV_SECTOR_BITS,
+                                       iscsilun);
 }

 static unsigned long *iscsi_allocationmap_init(IscsiLun *iscsilun)
@@ -461,7 +468,7 @@ iscsi_co_writev_flags(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
     if (fua) {
         assert(iscsilun->dpofua);
     }
-    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+    if (!is_sector_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
         return -EINVAL;
     }

@@ -541,7 +548,7 @@ static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs,

     iscsi_co_init_iscsitask(iscsilun, &iTask);

-    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+    if (!is_sector_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
         ret = -EINVAL;
         goto out;
     }
@@ -638,7 +645,7 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState *bs,
     uint64_t lba;
     uint32_t num_sectors;

-    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+    if (!is_sector_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
         return -EINVAL;
     }

@@ -918,7 +925,7 @@ coroutine_fn iscsi_co_discard(BlockDriverState *bs, int64_t sector_num,
     struct IscsiTask iTask;
     struct unmap_list list;

-    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+    if (!is_sector_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
         return -EINVAL;
     }

@@ -969,8 +976,8 @@ retry:
 }

 static int
-coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
-                                   int nb_sectors, BdrvRequestFlags flags)
+coroutine_fn iscsi_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+                                    int count, BdrvRequestFlags flags)
 {
     IscsiLun *iscsilun = bs->opaque;
     struct IscsiTask iTask;
@@ -978,7 +985,7 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
     uint32_t nb_blocks;
     bool use_16_for_ws = iscsilun->use_16_for_rw;

-    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+    if (!is_byte_request_lun_aligned(offset, count, iscsilun)) {
         return -EINVAL;
     }

@@ -1000,8 +1007,8 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
         return -ENOTSUP;
     }

-    lba = sector_qemu2lun(sector_num, iscsilun);
-    nb_blocks = sector_qemu2lun(nb_sectors, iscsilun);
+    lba = offset / iscsilun->block_size;
+    nb_blocks = count / iscsilun->block_size;

     if (iscsilun->zeroblock == NULL) {
         iscsilun->zeroblock = g_try_malloc0(iscsilun->block_size);
@@ -1057,9 +1064,11 @@ retry:
     }

     if (flags & BDRV_REQ_MAY_UNMAP) {
-        iscsi_allocationmap_clear(iscsilun, sector_num, nb_sectors);
+        iscsi_allocationmap_clear(iscsilun, offset >> BDRV_SECTOR_BITS,
+                                  count >> BDRV_SECTOR_BITS);
     } else {
-        iscsi_allocationmap_set(iscsilun, sector_num, nb_sectors);
+        iscsi_allocationmap_set(iscsilun, offset >> BDRV_SECTOR_BITS,
+                                count >> BDRV_SECTOR_BITS);
     }

     return 0;
@@ -1842,7 +1851,7 @@ static BlockDriver bdrv_iscsi = {

     .bdrv_co_get_block_status = iscsi_co_get_block_status,
     .bdrv_co_discard      = iscsi_co_discard,
-    .bdrv_co_write_zeroes = iscsi_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes = iscsi_co_pwrite_zeroes,
     .bdrv_co_readv         = iscsi_co_readv,
     .bdrv_co_writev_flags  = iscsi_co_writev_flags,
     .bdrv_co_flush_to_disk = iscsi_co_flush,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 06/13] qcow2: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (4 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 05/13] iscsi: Convert to bdrv_co_pwrite_zeroes() Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 13:53   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 07/13] blkreplay: " Eric Blake
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Max Reitz

Another step on our continuing quest to switch to byte-based
interfaces.

There are still opportunities to optimize the qcow2 handling
of zero clusters.  For example, if the backing file only has
non-zero data in the portion about to be overwritten, then
we could widen the request and make the entire cluster zero,
rather than falling back to -ENOTSUP.  But for this patch,
intentionally leave the semantics unchanged, even if not
optimal.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c | 46 +++++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 978694e..3522fc0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2428,43 +2428,47 @@ static bool is_zero_cluster_top_locked(BlockDriverState *bs, int64_t start)
     return ret == QCOW2_CLUSTER_UNALLOCATED || ret == QCOW2_CLUSTER_ZERO;
 }

-static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
+static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
+    int64_t offset, int count, BdrvRequestFlags flags)
 {
     int ret;
     BDRVQcow2State *s = bs->opaque;

-    int head = sector_num % s->cluster_sectors;
-    int tail = (sector_num + nb_sectors) % s->cluster_sectors;
+    int head = offset % s->cluster_size;
+    int tail = (offset + count) % s->cluster_size;

+    /* Widen the write to a full cluster, if the cluster already reads
+     * as zero. */
     if (head != 0 || tail != 0) {
-        int64_t cl_end = -1;
+        int64_t tail_sector = 0;

-        sector_num -= head;
-        nb_sectors += head;
-
-        if (tail != 0) {
-            nb_sectors += s->cluster_sectors - tail;
+        offset -= head;
+        count += head;
+        if (tail) {
+            count += s->cluster_size - tail;
         }

-        if (!is_zero_cluster(bs, sector_num)) {
+        if (!is_zero_cluster(bs, offset >> BDRV_SECTOR_BITS)) {
             return -ENOTSUP;
         }

-        if (nb_sectors > s->cluster_sectors) {
-            /* Technically the request can cover 2 clusters, f.e. 4k write
-               at s->cluster_sectors - 2k offset. One of these cluster can
-               be zeroed, one unallocated */
-            cl_end = sector_num + nb_sectors - s->cluster_sectors;
-            if (!is_zero_cluster(bs, cl_end)) {
+        if (count > s->cluster_size) {
+            /* Technically the request can cover 2 clusters, f.e. 4k
+             * write at s->cluster_sectors - 2k offset. One of these
+             * cluster can be zeroed, one unallocated. Anything larger
+             * and the front end already split it to alignment
+             * boundaries. */
+            assert(count == 2 * s->cluster_size);
+            tail_sector = (offset >> BDRV_SECTOR_BITS) + s->cluster_sectors;
+            if (!is_zero_cluster(bs, tail_sector)) {
                 return -ENOTSUP;
             }
         }

         qemu_co_mutex_lock(&s->lock);
         /* We can have new write after previous check */
-        if (!is_zero_cluster_top_locked(bs, sector_num) ||
-                (cl_end > 0 && !is_zero_cluster_top_locked(bs, cl_end))) {
+        if (!is_zero_cluster_top_locked(bs, offset >> BDRV_SECTOR_BITS) ||
+            (tail_sector && !is_zero_cluster_top_locked(bs, tail_sector))) {
             qemu_co_mutex_unlock(&s->lock);
             return -ENOTSUP;
         }
@@ -2473,7 +2477,7 @@ static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
     }

     /* Whatever is left can use real zero clusters */
-    ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, nb_sectors);
+    ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS);
     qemu_co_mutex_unlock(&s->lock);

     return ret;
@@ -3380,7 +3384,7 @@ BlockDriver bdrv_qcow2 = {
     .bdrv_co_writev         = qcow2_co_writev,
     .bdrv_co_flush_to_os    = qcow2_co_flush_to_os,

-    .bdrv_co_write_zeroes   = qcow2_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes  = qcow2_co_pwrite_zeroes,
     .bdrv_co_discard        = qcow2_co_discard,
     .bdrv_truncate          = qcow2_truncate,
     .bdrv_write_compressed  = qcow2_write_compressed,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 07/13] blkreplay: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (5 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 06/13] qcow2: " Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 13:54   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 08/13] gluster: " Eric Blake
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Max Reitz

Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/blkreplay.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/blkreplay.c b/block/blkreplay.c
index 1a721ad..525c2d5 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -103,13 +103,11 @@ static int coroutine_fn blkreplay_co_writev(BlockDriverState *bs,
     return ret;
 }

-static int coroutine_fn blkreplay_co_write_zeroes(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
+static int coroutine_fn blkreplay_co_pwrite_zeroes(BlockDriverState *bs,
+    int64_t offset, int count, BdrvRequestFlags flags)
 {
     uint64_t reqid = request_id++;
-    int ret = bdrv_co_pwrite_zeroes(bs->file->bs,
-                                    sector_num << BDRV_SECTOR_BITS,
-                                    nb_sectors << BDRV_SECTOR_BITS, flags);
+    int ret = bdrv_co_pwrite_zeroes(bs->file->bs, offset, count, flags);
     block_request_create(reqid, bs, qemu_coroutine_self());
     qemu_coroutine_yield();

@@ -149,7 +147,7 @@ static BlockDriver bdrv_blkreplay = {
     .bdrv_co_readv          = blkreplay_co_readv,
     .bdrv_co_writev         = blkreplay_co_writev,

-    .bdrv_co_write_zeroes   = blkreplay_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes  = blkreplay_co_pwrite_zeroes,
     .bdrv_co_discard        = blkreplay_co_discard,
     .bdrv_co_flush          = blkreplay_co_flush,
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 08/13] gluster: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (6 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 07/13] blkreplay: " Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 13:57   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 09/13] qed: " Eric Blake
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Jeff Cody, Max Reitz

Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/gluster.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index a8aaacf..15aff4b 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -454,14 +454,13 @@ static void qemu_gluster_reopen_abort(BDRVReopenState *state)
 }

 #ifdef CONFIG_GLUSTERFS_ZEROFILL
-static coroutine_fn int qemu_gluster_co_write_zeroes(BlockDriverState *bs,
-        int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
+static coroutine_fn int qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
+        int64_t offset, int count, BdrvRequestFlags flags)
 {
     int ret;
     GlusterAIOCB acb;
     BDRVGlusterState *s = bs->opaque;
-    off_t size = nb_sectors * BDRV_SECTOR_SIZE;
-    off_t offset = sector_num * BDRV_SECTOR_SIZE;
+    off_t size = count;

     acb.size = size;
     acb.ret = 0;
@@ -769,7 +768,7 @@ static BlockDriver bdrv_gluster = {
     .bdrv_co_discard              = qemu_gluster_co_discard,
 #endif
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
-    .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes        = qemu_gluster_co_pwrite_zeroes,
 #endif
     .create_opts                  = &qemu_gluster_create_opts,
 };
@@ -796,7 +795,7 @@ static BlockDriver bdrv_gluster_tcp = {
     .bdrv_co_discard              = qemu_gluster_co_discard,
 #endif
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
-    .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes        = qemu_gluster_co_pwrite_zeroes,
 #endif
     .create_opts                  = &qemu_gluster_create_opts,
 };
@@ -823,7 +822,7 @@ static BlockDriver bdrv_gluster_unix = {
     .bdrv_co_discard              = qemu_gluster_co_discard,
 #endif
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
-    .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes        = qemu_gluster_co_pwrite_zeroes,
 #endif
     .create_opts                  = &qemu_gluster_create_opts,
 };
@@ -850,7 +849,7 @@ static BlockDriver bdrv_gluster_rdma = {
     .bdrv_co_discard              = qemu_gluster_co_discard,
 #endif
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
-    .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes        = qemu_gluster_co_pwrite_zeroes,
 #endif
     .create_opts                  = &qemu_gluster_create_opts,
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 09/13] qed: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (7 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 08/13] gluster: " Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 14:07   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 10/13] raw-posix: " Eric Blake
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Stefan Hajnoczi, Max Reitz

Another step on our continuing quest to switch to byte-based
interfaces.

Kill an abuse of the comma operator while at it (fortunately,
the semantics were still right).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/qed.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index 0ab5b40..a0be886 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1419,7 +1419,7 @@ typedef struct {
     bool done;
 } QEDWriteZeroesCB;

-static void coroutine_fn qed_co_write_zeroes_cb(void *opaque, int ret)
+static void coroutine_fn qed_co_pwrite_zeroes_cb(void *opaque, int ret)
 {
     QEDWriteZeroesCB *cb = opaque;

@@ -1430,10 +1430,10 @@ static void coroutine_fn qed_co_write_zeroes_cb(void *opaque, int ret)
     }
 }

-static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
-                                                 int64_t sector_num,
-                                                 int nb_sectors,
-                                                 BdrvRequestFlags flags)
+static int coroutine_fn bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
+                                                  int64_t offset,
+                                                  int count,
+                                                  BdrvRequestFlags flags)
 {
     BlockAIOCB *blockacb;
     BDRVQEDState *s = bs->opaque;
@@ -1443,10 +1443,10 @@ static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,

     /* Refuse if there are untouched backing file sectors */
     if (bs->backing) {
-        if (qed_offset_into_cluster(s, sector_num * BDRV_SECTOR_SIZE) != 0) {
+        if (qed_offset_into_cluster(s, offset) != 0) {
             return -ENOTSUP;
         }
-        if (qed_offset_into_cluster(s, nb_sectors * BDRV_SECTOR_SIZE) != 0) {
+        if (qed_offset_into_cluster(s, count) != 0) {
             return -ENOTSUP;
         }
     }
@@ -1454,12 +1454,13 @@ static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
     /* Zero writes start without an I/O buffer.  If a buffer becomes necessary
      * then it will be allocated during request processing.
      */
-    iov.iov_base = NULL,
-    iov.iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
+    iov.iov_base = NULL;
+    iov.iov_len = count;

     qemu_iovec_init_external(&qiov, &iov, 1);
-    blockacb = qed_aio_setup(bs, sector_num, &qiov, nb_sectors,
-                             qed_co_write_zeroes_cb, &cb,
+    blockacb = qed_aio_setup(bs, offset >> BDRV_SECTOR_BITS, &qiov,
+                             count >> BDRV_SECTOR_BITS,
+                             qed_co_pwrite_zeroes_cb, &cb,
                              QED_AIOCB_WRITE | QED_AIOCB_ZERO);
     if (!blockacb) {
         return -EIO;
@@ -1664,7 +1665,7 @@ static BlockDriver bdrv_qed = {
     .bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
     .bdrv_aio_readv           = bdrv_qed_aio_readv,
     .bdrv_aio_writev          = bdrv_qed_aio_writev,
-    .bdrv_co_write_zeroes     = bdrv_qed_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes    = bdrv_qed_co_pwrite_zeroes,
     .bdrv_truncate            = bdrv_qed_truncate,
     .bdrv_getlength           = bdrv_qed_getlength,
     .bdrv_get_info            = bdrv_qed_get_info,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 10/13] raw-posix: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (8 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 09/13] qed: " Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 14:20   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 11/13] raw_bsd: " Eric Blake
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Max Reitz

Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/raw-posix.c | 37 +++++++++++++++----------------------
 trace-events      |  2 +-
 2 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index a4f5a1b..bb691f6 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1252,8 +1252,7 @@ static int aio_worker(void *arg)
 }

 static int paio_submit_co(BlockDriverState *bs, int fd,
-        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
-        int type)
+                          int64_t offset, int count, int type)
 {
     RawPosixAIOData *acb = g_new(RawPosixAIOData, 1);
     ThreadPool *pool;
@@ -1262,16 +1261,10 @@ static int paio_submit_co(BlockDriverState *bs, int fd,
     acb->aio_type = type;
     acb->aio_fildes = fd;

-    acb->aio_nbytes = nb_sectors * BDRV_SECTOR_SIZE;
-    acb->aio_offset = sector_num * BDRV_SECTOR_SIZE;
+    acb->aio_nbytes = count;
+    acb->aio_offset = offset;

-    if (qiov) {
-        acb->aio_iov = qiov->iov;
-        acb->aio_niov = qiov->niov;
-        assert(qiov->size == acb->aio_nbytes);
-    }
-
-    trace_paio_submit_co(sector_num, nb_sectors, type);
+    trace_paio_submit_co(offset, count, type);
     pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
     return thread_pool_submit_co(pool, aio_worker, acb);
 }
@@ -1868,17 +1861,17 @@ static coroutine_fn BlockAIOCB *raw_aio_discard(BlockDriverState *bs,
                        cb, opaque, QEMU_AIO_DISCARD);
 }

-static int coroutine_fn raw_co_write_zeroes(
-    BlockDriverState *bs, int64_t sector_num,
-    int nb_sectors, BdrvRequestFlags flags)
+static int coroutine_fn raw_co_pwrite_zeroes(
+    BlockDriverState *bs, int64_t offset,
+    int count, BdrvRequestFlags flags)
 {
     BDRVRawState *s = bs->opaque;

     if (!(flags & BDRV_REQ_MAY_UNMAP)) {
-        return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+        return paio_submit_co(bs, s->fd, offset, count,
                               QEMU_AIO_WRITE_ZEROES);
     } else if (s->discard_zeroes) {
-        return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+        return paio_submit_co(bs, s->fd, offset, count,
                               QEMU_AIO_DISCARD);
     }
     return -ENOTSUP;
@@ -1931,7 +1924,7 @@ BlockDriver bdrv_file = {
     .bdrv_create = raw_create,
     .bdrv_has_zero_init = bdrv_has_zero_init_1,
     .bdrv_co_get_block_status = raw_co_get_block_status,
-    .bdrv_co_write_zeroes = raw_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,

     .bdrv_aio_readv = raw_aio_readv,
     .bdrv_aio_writev = raw_aio_writev,
@@ -2293,8 +2286,8 @@ static coroutine_fn BlockAIOCB *hdev_aio_discard(BlockDriverState *bs,
                        cb, opaque, QEMU_AIO_DISCARD|QEMU_AIO_BLKDEV);
 }

-static coroutine_fn int hdev_co_write_zeroes(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
+static coroutine_fn int hdev_co_pwrite_zeroes(BlockDriverState *bs,
+    int64_t offset, int count, BdrvRequestFlags flags)
 {
     BDRVRawState *s = bs->opaque;
     int rc;
@@ -2304,10 +2297,10 @@ static coroutine_fn int hdev_co_write_zeroes(BlockDriverState *bs,
         return rc;
     }
     if (!(flags & BDRV_REQ_MAY_UNMAP)) {
-        return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+        return paio_submit_co(bs, s->fd, offset, count,
                               QEMU_AIO_WRITE_ZEROES|QEMU_AIO_BLKDEV);
     } else if (s->discard_zeroes) {
-        return paio_submit_co(bs, s->fd, sector_num, NULL, nb_sectors,
+        return paio_submit_co(bs, s->fd, offset, count,
                               QEMU_AIO_DISCARD|QEMU_AIO_BLKDEV);
     }
     return -ENOTSUP;
@@ -2379,7 +2372,7 @@ static BlockDriver bdrv_host_device = {
     .bdrv_reopen_abort   = raw_reopen_abort,
     .bdrv_create         = hdev_create,
     .create_opts         = &raw_create_opts,
-    .bdrv_co_write_zeroes = hdev_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,

     .bdrv_aio_readv	= raw_aio_readv,
     .bdrv_aio_writev	= raw_aio_writev,
diff --git a/trace-events b/trace-events
index 97be1d7..b5b03ae 100644
--- a/trace-events
+++ b/trace-events
@@ -130,7 +130,7 @@ thread_pool_cancel(void *req, void *opaque) "req %p opaque %p"

 # block/raw-win32.c
 # block/raw-posix.c
-paio_submit_co(int64_t sector_num, int nb_sectors, int type) "sector_num %"PRId64" nb_sectors %d type %d"
+paio_submit_co(int64_t offset, int count, int type) "offset %"PRId64" count %d type %d"
 paio_submit(void *acb, void *opaque, int64_t sector_num, int nb_sectors, int type) "acb %p opaque %p sector_num %"PRId64" nb_sectors %d type %d"

 # ioport.c
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 11/13] raw_bsd: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (9 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 10/13] raw-posix: " Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 14:20   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 12/13] vmdk: " Eric Blake
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Max Reitz

Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/raw_bsd.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index d9adf90..b1d5237 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -127,12 +127,11 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
            (sector_num << BDRV_SECTOR_BITS);
 }

-static int coroutine_fn raw_co_write_zeroes(BlockDriverState *bs,
-                                            int64_t sector_num, int nb_sectors,
-                                            BdrvRequestFlags flags)
+static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
+                                             int64_t offset, int count,
+                                             BdrvRequestFlags flags)
 {
-    return bdrv_co_pwrite_zeroes(bs->file->bs, sector_num << BDRV_SECTOR_BITS,
-                                 nb_sectors << BDRV_SECTOR_BITS, flags);
+    return bdrv_co_pwrite_zeroes(bs->file->bs, offset, count, flags);
 }

 static int coroutine_fn raw_co_discard(BlockDriverState *bs,
@@ -253,7 +252,7 @@ BlockDriver bdrv_raw = {
     .bdrv_create          = &raw_create,
     .bdrv_co_readv        = &raw_co_readv,
     .bdrv_co_writev_flags = &raw_co_writev_flags,
-    .bdrv_co_write_zeroes = &raw_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
     .bdrv_co_discard      = &raw_co_discard,
     .bdrv_co_get_block_status = &raw_co_get_block_status,
     .bdrv_truncate        = &raw_truncate,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 12/13] vmdk: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (10 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 11/13] raw_bsd: " Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 14:23   ` Kevin Wolf
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 13/13] block: Kill bdrv_co_write_zeroes() Eric Blake
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Fam Zheng, Max Reitz

Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/vmdk.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 8494d63..284d7a0 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1704,15 +1704,14 @@ static int vmdk_write_compressed(BlockDriverState *bs,
     }
 }

-static int coroutine_fn vmdk_co_write_zeroes(BlockDriverState *bs,
-                                             int64_t sector_num,
-                                             int nb_sectors,
-                                             BdrvRequestFlags flags)
+static int coroutine_fn vmdk_co_pwrite_zeroes(BlockDriverState *bs,
+                                              int64_t offset,
+                                              int count,
+                                              BdrvRequestFlags flags)
 {
     int ret;
     BDRVVmdkState *s = bs->opaque;
-    uint64_t offset = sector_num * BDRV_SECTOR_SIZE;
-    uint64_t bytes = nb_sectors * BDRV_SECTOR_SIZE;
+    uint64_t bytes = count;

     qemu_co_mutex_lock(&s->lock);
     /* write zeroes could fail if sectors not aligned to cluster, test it with
@@ -2403,7 +2402,7 @@ static BlockDriver bdrv_vmdk = {
     .bdrv_co_preadv               = vmdk_co_preadv,
     .bdrv_co_pwritev              = vmdk_co_pwritev,
     .bdrv_write_compressed        = vmdk_write_compressed,
-    .bdrv_co_write_zeroes         = vmdk_co_write_zeroes,
+    .bdrv_co_pwrite_zeroes        = vmdk_co_pwrite_zeroes,
     .bdrv_close                   = vmdk_close,
     .bdrv_create                  = vmdk_create,
     .bdrv_co_flush_to_disk        = vmdk_co_flush,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 13/13] block: Kill bdrv_co_write_zeroes()
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (11 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 12/13] vmdk: " Eric Blake
@ 2016-05-24 22:25 ` Eric Blake
  2016-05-25 14:24   ` Kevin Wolf
  2016-05-25 11:02 ` [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Kevin Wolf
  2016-06-01 15:35 ` Kevin Wolf
  14 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-24 22:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Stefan Hajnoczi, Fam Zheng, Max Reitz

Now that all drivers have been converted to a byte interface,
we no longer need a sector interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/block/block_int.h |  2 --
 block/io.c                | 15 ++-------------
 2 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index fa7e3f9..129263a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -163,8 +163,6 @@ struct BlockDriver {
      * function pointer may be NULL or return -ENOSUP and .bdrv_co_writev()
      * will be called instead.
      */
-    int coroutine_fn (*bdrv_co_write_zeroes)(BlockDriverState *bs,
-        int64_t sector_num, int nb_sectors, BdrvRequestFlags flags);
     int coroutine_fn (*bdrv_co_pwrite_zeroes)(BlockDriverState *bs,
         int64_t offset, int count, BdrvRequestFlags flags);
     int coroutine_fn (*bdrv_co_discard)(BlockDriverState *bs,
diff --git a/block/io.c b/block/io.c
index ea8135f..4577228 100644
--- a/block/io.c
+++ b/block/io.c
@@ -880,7 +880,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
         goto err;
     }

-    if ((drv->bdrv_co_write_zeroes || drv->bdrv_co_pwrite_zeroes) &&
+    if (drv->bdrv_co_pwrite_zeroes &&
         buffer_is_zero(bounce_buffer, iov.iov_len)) {
         ret = bdrv_co_do_pwrite_zeroes(bs,
                                        cluster_sector_num * BDRV_SECTOR_SIZE,
@@ -1161,16 +1161,6 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
                 !(bs->supported_zero_flags & BDRV_REQ_FUA)) {
                 need_flush = true;
             }
-        } else if (drv->bdrv_co_write_zeroes) {
-            assert(offset % BDRV_SECTOR_SIZE == 0);
-            assert(count % BDRV_SECTOR_SIZE == 0);
-            ret = drv->bdrv_co_write_zeroes(bs, offset >> BDRV_SECTOR_BITS,
-                                            num >> BDRV_SECTOR_BITS,
-                                            flags & bs->supported_zero_flags);
-            if (ret != -ENOTSUP && (flags & BDRV_REQ_FUA) &&
-                !(bs->supported_zero_flags & BDRV_REQ_FUA)) {
-                need_flush = true;
-            }
         } else {
             assert(!bs->supported_zero_flags);
         }
@@ -1250,8 +1240,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
     ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);

     if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
-        !(flags & BDRV_REQ_ZERO_WRITE) &&
-        (drv->bdrv_co_pwrite_zeroes || drv->bdrv_co_write_zeroes) &&
+        !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes &&
         qemu_iovec_is_zero(qiov)) {
         flags |= BDRV_REQ_ZERO_WRITE;
         if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] block: Track write zero limits in bytes
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 02/13] block: Track write zero limits in bytes Eric Blake
@ 2016-05-25 10:30   ` Kevin Wolf
  2016-05-25 11:21     ` Eric Blake
  0 siblings, 1 reply; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 10:30 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Fam Zheng, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step towards removing sector-based interfaces: convert
> the maximum write and minimum alignment values from sectorss to

s/sectorss/sectors/

> bytes.  Alignment is changed to 'int', since it makes no sense
> to have an alignment larger than the maximum write.  Add an
> assert that no one was trying to use sectors to get a write
> zeroes larger than 2G.  Rename the variables to let the compiler
> check that all users are converted to the new semantics.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -1706,12 +1706,10 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>      }
> 
>      if (iscsilun->bl.max_ws_len < 0xffffffff) {
> -        bs->bl.max_write_zeroes =
> -            sector_limits_lun2qemu(iscsilun->bl.max_ws_len, iscsilun);
> +        bs->bl.max_pwrite_zeroes = iscsilun->bl.max_ws_len;

Wrong unit, I think. You need to multiply by iscsi_lun->block_size.

>      }
>      if (iscsilun->lbp.lbpws) {
> -        bs->bl.write_zeroes_alignment =
> -            sector_limits_lun2qemu(iscsilun->bl.opt_unmap_gran, iscsilun);
> +        bs->bl.pwrite_zeroes_alignment = iscsilun->bl.opt_unmap_gran;

Same here.

>      }
>      bs->bl.opt_transfer_length =
>          sector_limits_lun2qemu(iscsilun->bl.opt_xfer_len, iscsilun);
> diff --git a/block/qcow2.c b/block/qcow2.c
> index c9306a7..745b66f 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1193,7 +1193,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>      BDRVQcow2State *s = bs->opaque;
> 
> -    bs->bl.write_zeroes_alignment = s->cluster_sectors;
> +    bs->bl.pwrite_zeroes_alignment = s->cluster_sectors << BDRV_SECTOR_BITS;

This is s->cluster_size. I hope to get rid of s->cluster_sectors
eventually. :-)

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (12 preceding siblings ...)
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 13/13] block: Kill bdrv_co_write_zeroes() Eric Blake
@ 2016-05-25 11:02 ` Kevin Wolf
  2016-06-01 15:35 ` Kevin Wolf
  14 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 11:02 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Kevin pointed out that my recent change to byte-based instead
> of sector-based blk_write_zeroes() (commit 983a1600) makes life
> harder as long as bdrv_write_zeroes is still sector-based, and
> where the compiler doesn't flag any change in parameter types.
> Complete the conversion, by renaming things (so the compiler
> will help flag any future rebase conflicts), and making all
> write_zeroes operations nominally take bytes.
> 
> Definitely conflicts with Denis' qcow2_co_write_zeroes improvements
> series, and probably with Kevin's conversion of block jobs to
> BlockBackend. I can rebase if those land on the block branch first.

I think I'll just pick up your patch 1 and include it in the next
version of my series and then we can let git handle the question whose
copy gets in. The rest of this series shouldn't conflict with mine.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] block: Track write zero limits in bytes
  2016-05-25 10:30   ` Kevin Wolf
@ 2016-05-25 11:21     ` Eric Blake
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2016-05-25 11:21 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Fam Zheng, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

[-- Attachment #1: Type: text/plain, Size: 2428 bytes --]

On 05/25/2016 04:30 AM, Kevin Wolf wrote:
> Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
>> Another step towards removing sector-based interfaces: convert
>> the maximum write and minimum alignment values from sectorss to
> 
> s/sectorss/sectors/
> 
>> bytes.  Alignment is changed to 'int', since it makes no sense
>> to have an alignment larger than the maximum write.  Add an
>> assert that no one was trying to use sectors to get a write
>> zeroes larger than 2G.  Rename the variables to let the compiler
>> check that all users are converted to the new semantics.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
>> --- a/block/iscsi.c
>> +++ b/block/iscsi.c
>> @@ -1706,12 +1706,10 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>>      }
>>
>>      if (iscsilun->bl.max_ws_len < 0xffffffff) {
>> -        bs->bl.max_write_zeroes =
>> -            sector_limits_lun2qemu(iscsilun->bl.max_ws_len, iscsilun);
>> +        bs->bl.max_pwrite_zeroes = iscsilun->bl.max_ws_len;
> 
> Wrong unit, I think. You need to multiply by iscsi_lun->block_size.

Hmm, I think you're right.  What's more, I need to make sure the result
doesn't wrap around INT_MAX (a device with 4k block size that supports
8G limits via 2M max blocks should still allow up to 2G transactions
from qemu). I'm also thinking that in v2, it may be easier to reason
about alignment limits if I convert alignment numbers to uint32_t,
although we are still capped by INT_MAX in our various blk_* interfaces
(worrying about signed overflow is a pain).

>> +++ b/block/qcow2.c
>> @@ -1193,7 +1193,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
>>  {
>>      BDRVQcow2State *s = bs->opaque;
>>
>> -    bs->bl.write_zeroes_alignment = s->cluster_sectors;
>> +    bs->bl.pwrite_zeroes_alignment = s->cluster_sectors << BDRV_SECTOR_BITS;
> 
> This is s->cluster_size. I hope to get rid of s->cluster_sectors
> eventually. :-)

Should I go ahead and convert ALL of BlockLimits to be byte-based
limits, rather than an odd mix of sector vs. byte limits?  Should I add
any assertions for power-of-2 limits?  Do we want to allow 0x80000000 as
a valid length limit?

[/me Should I be regretting touching this can of worms in the first
place? :) ]

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] block: Add .bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 03/13] block: Add .bdrv_co_pwrite_zeroes() Eric Blake
@ 2016-05-25 13:02   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 13:02 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Fam Zheng, Max Reitz

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Update bdrv_co_do_write_zeroes() to be byte-based, and select
> between the new byte-based bdrv_co_pwrite_zeroes() or the old
> bdrv_co_write_zeroes().  The next patches will convert drivers,
> then remove the old interface.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  include/block/block_int.h |  4 ++-
>  block/io.c                | 81 +++++++++++++++++++++++++----------------------
>  2 files changed, 47 insertions(+), 38 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 4282ffd..fa7e3f9 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -165,6 +165,8 @@ struct BlockDriver {
>       */
>      int coroutine_fn (*bdrv_co_write_zeroes)(BlockDriverState *bs,
>          int64_t sector_num, int nb_sectors, BdrvRequestFlags flags);
> +    int coroutine_fn (*bdrv_co_pwrite_zeroes)(BlockDriverState *bs,
> +        int64_t offset, int count, BdrvRequestFlags flags);
>      int coroutine_fn (*bdrv_co_discard)(BlockDriverState *bs,
>          int64_t sector_num, int nb_sectors);
>      int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
> @@ -454,7 +456,7 @@ struct BlockDriverState {
>      unsigned int request_alignment;
>      /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
>      unsigned int supported_write_flags;
> -    /* Flags honored during write_zeroes (so far: BDRV_REQ_FUA,
> +    /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
>       * BDRV_REQ_MAY_UNMAP) */
>      unsigned int supported_zero_flags;
> 
> diff --git a/block/io.c b/block/io.c
> index 41b4e9d..c1d700b 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -42,8 +42,8 @@ static BlockAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>                                           void *opaque,
>                                           bool is_write);
>  static void coroutine_fn bdrv_co_do_rw(void *opaque);
> -static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
> -    int64_t sector_num, int nb_sectors, BdrvRequestFlags flags);
> +static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
> +    int64_t offset, int count, BdrvRequestFlags flags);
> 
>  static void bdrv_parent_drained_begin(BlockDriverState *bs)
>  {
> @@ -876,10 +876,12 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
>          goto err;
>      }
> 
> -    if (drv->bdrv_co_write_zeroes &&
> +    if ((drv->bdrv_co_write_zeroes || drv->bdrv_co_pwrite_zeroes) &&
>          buffer_is_zero(bounce_buffer, iov.iov_len)) {
> -        ret = bdrv_co_do_write_zeroes(bs, cluster_sector_num,
> -                                      cluster_nb_sectors, 0);
> +        ret = bdrv_co_do_pwrite_zeroes(bs,
> +                                       cluster_sector_num * BDRV_SECTOR_SIZE,
> +                                       cluster_nb_sectors * BDRV_SECTOR_SIZE,
> +                                       0);
>      } else {
>          /* This does not change the data on the disk, it is not necessary
>           * to flush even in cache=writethrough mode.
> @@ -1111,8 +1113,8 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
> 
>  #define MAX_WRITE_ZEROES_BOUNCE_BUFFER 32768
> 
> -static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
> -    int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
> +static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
> +    int64_t offset, int count, BdrvRequestFlags flags)
>  {
>      BlockDriver *drv = bs->drv;
>      QEMUIOVector qiov;
> @@ -1121,40 +1123,45 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>      bool need_flush = false;
> 
>      int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
> -    int max_write_zeroes_sectors = max_write_zeroes >> BDRV_SECTOR_BITS;
> -    int write_zeroes_sector_align =
> -        bs->bl.pwrite_zeroes_alignment >> BDRV_SECTOR_BITS;
> +    int alignment = MAX(bs->bl.pwrite_zeroes_alignment, BDRV_SECTOR_SIZE);

Why do we round up to sector granularity? When everything is based on
bytes, this shouldn't be necessary, but even at the end of the series,
this is still done. Shouldn't bs->bl.pwrite_zeroes_alignment ?: 1 be
good enough?

Looks good to me otherwise.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] block: Switch bdrv_write_zeroes() to byte interface
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 04/13] block: Switch bdrv_write_zeroes() to byte interface Eric Blake
@ 2016-05-25 13:18   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 13:18 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, Jeff Cody, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, Denis V. Lunev, Juan Quintela, Amit Shah

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Rename to bdrv_pwrite_zeroes() to let the compiler ensure we
> cater to the updated semantics.  Do the same for
> bdrv_aio_write_zeroes() and bdrv_co_write_zeroes().  For now,
> we still require sector alignment in the callers, via assertions.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

> --- a/block/io.c
> +++ b/block/io.c
> @@ -603,18 +603,21 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
>      return bdrv_rw_co(bs, sector_num, (uint8_t *)buf, nb_sectors, true, 0);
>  }
> 
> -int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> -                      int nb_sectors, BdrvRequestFlags flags)
> +int bdrv_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
> +                       int count, BdrvRequestFlags flags)
>  {
> -    return bdrv_rw_co(bs, sector_num, NULL, nb_sectors, true,
> +    assert(offset % BDRV_SECTOR_SIZE == 0);
> +    assert(count % BDRV_SECTOR_SIZE == 0);
> +    return bdrv_rw_co(bs, offset >> BDRV_SECTOR_BITS, NULL,
> +                      count >> BDRV_SECTOR_BITS, true,
>                        BDRV_REQ_ZERO_WRITE | flags);
>  }

Should we go directly to bdrv_prwv_co() here so that we don't need to
assert BDRV_SECTOR_SIZE alignment in a byte-based function?

> -BlockAIOCB *bdrv_aio_write_zeroes(BlockDriverState *bs,
> -        int64_t sector_num, int nb_sectors, BdrvRequestFlags flags,
> +BlockAIOCB *bdrv_aio_pwrite_zeroes(BlockDriverState *bs,
> +        int64_t offset, int count, BdrvRequestFlags flags,
>          BlockCompletionFunc *cb, void *opaque)
>  {
> -    trace_bdrv_aio_write_zeroes(bs, sector_num, nb_sectors, flags, opaque);
> +    trace_bdrv_aio_pwrite_zeroes(bs, offset, count, flags, opaque);
> +    assert(offset % BDRV_SECTOR_SIZE == 0);
> +    assert(count % BDRV_SECTOR_SIZE == 0);
> 
> -    return bdrv_co_aio_rw_vector(bs, sector_num, NULL, nb_sectors,
> +    return bdrv_co_aio_rw_vector(bs, offset >> BDRV_SECTOR_BITS, NULL,
> +                                 count >> BDRV_SECTOR_BITS,
>                                   BDRV_REQ_ZERO_WRITE | flags,
>                                   cb, opaque, true);
>  }

Here the same would be nice, but we don't have a byte-based AIO
interface yet, so I'd agree with leaving the assertion here.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 05/13] iscsi: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 05/13] iscsi: Convert to bdrv_co_pwrite_zeroes() Eric Blake
@ 2016-05-25 13:34   ` Kevin Wolf
  2016-06-01 16:33     ` Eric Blake
  0 siblings, 1 reply; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 13:34 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, Ronnie Sahlberg, Paolo Bonzini,
	Peter Lieven, Max Reitz

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> As this is the first byte-based iscsi interface, convert
> is_request_lun_aligned() into two versions, one for sectors
> and one for bytes.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  block/iscsi.c | 53 +++++++++++++++++++++++++++++++----------------------
>  1 file changed, 31 insertions(+), 22 deletions(-)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 0acc3dc..3dbfd57 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -401,18 +401,25 @@ static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
>      return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
>  }
> 
> -static bool is_request_lun_aligned(int64_t sector_num, int nb_sectors,
> -                                      IscsiLun *iscsilun)
> +static bool is_byte_request_lun_aligned(int64_t offset, int count,
> +                                        IscsiLun *iscsilun)
>  {
> -    if ((sector_num * BDRV_SECTOR_SIZE) % iscsilun->block_size ||
> -        (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size) {
> -            error_report("iSCSI misaligned request: "
> -                         "iscsilun->block_size %u, sector_num %" PRIi64
> -                         ", nb_sectors %d",
> -                         iscsilun->block_size, sector_num, nb_sectors);
> -            return 0;
> +    if (offset % iscsilun->block_size || count % iscsilun->block_size) {
> +        error_report("iSCSI misaligned request: "
> +                     "iscsilun->block_size %u, offset %" PRIi64
> +                     ", count %d",
> +                     iscsilun->block_size, offset, count);
> +        return false;
>      }
> -    return 1;
> +    return true;
> +}
> +
> +static bool is_sector_request_lun_aligned(int64_t sector_num, int nb_sectors,
> +                                          IscsiLun *iscsilun)
> +{
> +    return is_byte_request_lun_aligned(sector_num << BDRV_SECTOR_BITS,
> +                                       nb_sectors << BDRV_SECTOR_BITS,
> +                                       iscsilun);
>  }

You're switching from (nb_sectors * BDRV_SECTOR_SIZE) to (nb_sectors <<
BDRV_SECTOR_BITS). The difference is that the former is a 64 bit
calculation because BDRV_SECTOR_BITS is unsigned long long, whereas the
latter is a 32 bit calculation.

Fortunately, it seems to me that all input values come directly from the
block layer which already limits requests to BDRV_REQUEST_MAX_SECTORS.
So we should be safe from overflows here.

>  static int
> -coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> -                                   int nb_sectors, BdrvRequestFlags flags)
> +coroutine_fn iscsi_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
> +                                    int count, BdrvRequestFlags flags)
>  {
>      IscsiLun *iscsilun = bs->opaque;
>      struct IscsiTask iTask;
> @@ -978,7 +985,7 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
>      uint32_t nb_blocks;
>      bool use_16_for_ws = iscsilun->use_16_for_rw;
> 
> -    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
> +    if (!is_byte_request_lun_aligned(offset, count, iscsilun)) {
>          return -EINVAL;
>      }

Should this become -ENOTSUP so that emulation can take over rather than
failing the request?

We should probably also always set bs->bl.pwrite_zeroes_alignment, with
a fallback to iscsilun->block_size if we don't have iscsilun->lbp.lbpws.
But that's a separate patch.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 06/13] qcow2: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 06/13] qcow2: " Eric Blake
@ 2016-05-25 13:53   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 13:53 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Max Reitz

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> There are still opportunities to optimize the qcow2 handling
> of zero clusters.  For example, if the backing file only has
> non-zero data in the portion about to be overwritten, then
> we could widen the request and make the entire cluster zero,
> rather than falling back to -ENOTSUP.  But for this patch,
> intentionally leave the semantics unchanged, even if not
> optimal.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] blkreplay: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 07/13] blkreplay: " Eric Blake
@ 2016-05-25 13:54   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 13:54 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Max Reitz

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 08/13] gluster: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 08/13] gluster: " Eric Blake
@ 2016-05-25 13:57   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 13:57 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Jeff Cody, Max Reitz

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  block/gluster.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index a8aaacf..15aff4b 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -454,14 +454,13 @@ static void qemu_gluster_reopen_abort(BDRVReopenState *state)
>  }
> 
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> -static coroutine_fn int qemu_gluster_co_write_zeroes(BlockDriverState *bs,
> -        int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
> +static coroutine_fn int qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
> +        int64_t offset, int count, BdrvRequestFlags flags)
>  {
>      int ret;
>      GlusterAIOCB acb;
>      BDRVGlusterState *s = bs->opaque;
> -    off_t size = nb_sectors * BDRV_SECTOR_SIZE;
> -    off_t offset = sector_num * BDRV_SECTOR_SIZE;
> +    off_t size = count;

This variable isn't really necessary. Up to you whether you want to
change it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 09/13] qed: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 09/13] qed: " Eric Blake
@ 2016-05-25 14:07   ` Kevin Wolf
  2016-05-25 14:28     ` Eric Blake
  0 siblings, 1 reply; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 14:07 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Max Reitz

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> Kill an abuse of the comma operator while at it (fortunately,
> the semantics were still right).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  block/qed.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/block/qed.c b/block/qed.c
> index 0ab5b40..a0be886 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -1419,7 +1419,7 @@ typedef struct {
>      bool done;
>  } QEDWriteZeroesCB;
> 
> -static void coroutine_fn qed_co_write_zeroes_cb(void *opaque, int ret)
> +static void coroutine_fn qed_co_pwrite_zeroes_cb(void *opaque, int ret)
>  {
>      QEDWriteZeroesCB *cb = opaque;
> 
> @@ -1430,10 +1430,10 @@ static void coroutine_fn qed_co_write_zeroes_cb(void *opaque, int ret)
>      }
>  }
> 
> -static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
> -                                                 int64_t sector_num,
> -                                                 int nb_sectors,
> -                                                 BdrvRequestFlags flags)
> +static int coroutine_fn bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
> +                                                  int64_t offset,
> +                                                  int count,
> +                                                  BdrvRequestFlags flags)
>  {
>      BlockAIOCB *blockacb;
>      BDRVQEDState *s = bs->opaque;
> @@ -1443,10 +1443,10 @@ static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
> 
>      /* Refuse if there are untouched backing file sectors */
>      if (bs->backing) {
> -        if (qed_offset_into_cluster(s, sector_num * BDRV_SECTOR_SIZE) != 0) {
> +        if (qed_offset_into_cluster(s, offset) != 0) {
>              return -ENOTSUP;
>          }
> -        if (qed_offset_into_cluster(s, nb_sectors * BDRV_SECTOR_SIZE) != 0) {
> +        if (qed_offset_into_cluster(s, count) != 0) {
>              return -ENOTSUP;
>          }
>      }

Unaligned requests are only emulated if there is no backing file...

> @@ -1454,12 +1454,13 @@ static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
>      /* Zero writes start without an I/O buffer.  If a buffer becomes necessary
>       * then it will be allocated during request processing.
>       */
> -    iov.iov_base = NULL,
> -    iov.iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
> +    iov.iov_base = NULL;
> +    iov.iov_len = count;
> 
>      qemu_iovec_init_external(&qiov, &iov, 1);
> -    blockacb = qed_aio_setup(bs, sector_num, &qiov, nb_sectors,
> -                             qed_co_write_zeroes_cb, &cb,
> +    blockacb = qed_aio_setup(bs, offset >> BDRV_SECTOR_BITS, &qiov,
> +                             count >> BDRV_SECTOR_BITS,

...so offset and count can still be unaligned here and we end up zeroing
out the wrong part of the sector. I guess we need to return -ENOTSUP for
all sub-sector requests, even without a backing file.

> +                             qed_co_pwrite_zeroes_cb, &cb,
>                               QED_AIOCB_WRITE | QED_AIOCB_ZERO);
>      if (!blockacb) {
>          return -EIO;

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 10/13] raw-posix: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 10/13] raw-posix: " Eric Blake
@ 2016-05-25 14:20   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 14:20 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Max Reitz

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  block/raw-posix.c | 37 +++++++++++++++----------------------
>  trace-events      |  2 +-
>  2 files changed, 16 insertions(+), 23 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index a4f5a1b..bb691f6 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1252,8 +1252,7 @@ static int aio_worker(void *arg)
>  }
> 
>  static int paio_submit_co(BlockDriverState *bs, int fd,
> -        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> -        int type)
> +                          int64_t offset, int count, int type)

Removing qiov makes sense if we only want to use the function for
write_zeroes and therefore don't need the full power of paio_submit(). I
still think that it would be good to convert raw-posix to the
(coroutine-based) .bdrv_co_preadv/pwritev and then we will need qiov
again.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] raw_bsd: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 11/13] raw_bsd: " Eric Blake
@ 2016-05-25 14:20   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 14:20 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Max Reitz

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 12/13] vmdk: Convert to bdrv_co_pwrite_zeroes()
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 12/13] vmdk: " Eric Blake
@ 2016-05-25 14:23   ` Kevin Wolf
  2016-05-25 14:35     ` Eric Blake
  0 siblings, 1 reply; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 14:23 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Fam Zheng, Max Reitz

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Another step on our continuing quest to switch to byte-based
> interfaces.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  block/vmdk.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 8494d63..284d7a0 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1704,15 +1704,14 @@ static int vmdk_write_compressed(BlockDriverState *bs,
>      }
>  }
> 
> -static int coroutine_fn vmdk_co_write_zeroes(BlockDriverState *bs,
> -                                             int64_t sector_num,
> -                                             int nb_sectors,
> -                                             BdrvRequestFlags flags)
> +static int coroutine_fn vmdk_co_pwrite_zeroes(BlockDriverState *bs,
> +                                              int64_t offset,
> +                                              int count,
> +                                              BdrvRequestFlags flags)
>  {
>      int ret;
>      BDRVVmdkState *s = bs->opaque;
> -    uint64_t offset = sector_num * BDRV_SECTOR_SIZE;
> -    uint64_t bytes = nb_sectors * BDRV_SECTOR_SIZE;
> +    uint64_t bytes = count;

That's an unnecessary variable again. Whether you decide to change it or
not:

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 13/13] block: Kill bdrv_co_write_zeroes()
  2016-05-24 22:25 ` [Qemu-devel] [PATCH 13/13] block: Kill bdrv_co_write_zeroes() Eric Blake
@ 2016-05-25 14:24   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 14:24 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Fam Zheng, Max Reitz

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Now that all drivers have been converted to a byte interface,
> we no longer need a sector interface.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 09/13] qed: Convert to bdrv_co_pwrite_zeroes()
  2016-05-25 14:07   ` Kevin Wolf
@ 2016-05-25 14:28     ` Eric Blake
  2016-05-25 15:06       ` Kevin Wolf
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2016-05-25 14:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 4086 bytes --]

On 05/25/2016 08:07 AM, Kevin Wolf wrote:
> Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
>> Another step on our continuing quest to switch to byte-based
>> interfaces.
>>
>> Kill an abuse of the comma operator while at it (fortunately,
>> the semantics were still right).
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---
>>  block/qed.c | 25 +++++++++++++------------
>>  1 file changed, 13 insertions(+), 12 deletions(-)
>>

>> @@ -1443,10 +1443,10 @@ static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
>>
>>      /* Refuse if there are untouched backing file sectors */

The comment wasn't very helpful, so I may rewort it, too
(s/untouched/unaligned/, or something like that)

>>      if (bs->backing) {
>> -        if (qed_offset_into_cluster(s, sector_num * BDRV_SECTOR_SIZE) != 0) {
>> +        if (qed_offset_into_cluster(s, offset) != 0) {
>>              return -ENOTSUP;
>>          }
>> -        if (qed_offset_into_cluster(s, nb_sectors * BDRV_SECTOR_SIZE) != 0) {
>> +        if (qed_offset_into_cluster(s, count) != 0) {
>>              return -ENOTSUP;
>>          }
>>      }
> 
> Unaligned requests are only emulated if there is no backing file...
> 
>> @@ -1454,12 +1454,13 @@ static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
>>      /* Zero writes start without an I/O buffer.  If a buffer becomes necessary
>>       * then it will be allocated during request processing.
>>       */
>> -    iov.iov_base = NULL,
>> -    iov.iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
>> +    iov.iov_base = NULL;
>> +    iov.iov_len = count;
>>
>>      qemu_iovec_init_external(&qiov, &iov, 1);
>> -    blockacb = qed_aio_setup(bs, sector_num, &qiov, nb_sectors,
>> -                             qed_co_write_zeroes_cb, &cb,
>> +    blockacb = qed_aio_setup(bs, offset >> BDRV_SECTOR_BITS, &qiov,
>> +                             count >> BDRV_SECTOR_BITS,
> 
> ...so offset and count can still be unaligned here and we end up zeroing
> out the wrong part of the sector. I guess we need to return -ENOTSUP for
> all sub-sector requests, even without a backing file.

Hmm. Wouldn't it be nicer if we could guarantee that blk_pwrite_zeroes()
will never call bdrv_co_pwrite_zeroes() with less than
request_alignment?  That is, if the block layer takes care of
read-modify-write for any unaligned byte offset less than
request_alignment, then the driver only has to worry about sector
alignment.  Except qed.c doesn't seem to set request_alignment, but is
just relying on io.c currently setting it to MAX(BDRV_SECTOR_SIZE,
bs->bl.request_alignment) everywhere. (And the fact that
request_alignment is a sibling rather than a member to BlockLimits bl is
awkward.)

Maybe we want three limits in BlockLimits, rather than two: the current
max_pwrite_zeroes does a good job at saying how small blk_pwrite_zeroes
must fragment large requests, and pwrite_zeroes_alignment does a good
job at saying how large a request must be to potentially punch a hole,
but at least in the case of qcow2, where we want to optimize a partial
write to potentially zeroing an entire cluster, we still want to limit
things to sector boundaries when checking for whether the rest of the
cluster already reads as zeroes, whether or not we also want to support
request_alignment of 1 instead of 512.

There are other drivers that I touched in this series that were relying
on the fact that the block layer currently guarantees sector alignment,
and maybe they should be setting request_alignment, or maybe we want to
add yet another BlockLimit member.  So even if we want normal read/write
to allow request_alignment of 1 in the case where we don't need the
block layer to do a read-modify-write, I'm still wondering whether we
want the write_zeroes engine to have a different minimum alignment and
ALWAYS hand off to normal read-modify-write for anything smaller.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 12/13] vmdk: Convert to bdrv_co_pwrite_zeroes()
  2016-05-25 14:23   ` Kevin Wolf
@ 2016-05-25 14:35     ` Eric Blake
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2016-05-25 14:35 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Fam Zheng, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2375 bytes --]

On 05/25/2016 08:23 AM, Kevin Wolf wrote:
> Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
>> Another step on our continuing quest to switch to byte-based
>> interfaces.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---
>>  block/vmdk.c | 13 ++++++-------
>>  1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index 8494d63..284d7a0 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -1704,15 +1704,14 @@ static int vmdk_write_compressed(BlockDriverState *bs,
>>      }
>>  }
>>
>> -static int coroutine_fn vmdk_co_write_zeroes(BlockDriverState *bs,
>> -                                             int64_t sector_num,
>> -                                             int nb_sectors,
>> -                                             BdrvRequestFlags flags)
>> +static int coroutine_fn vmdk_co_pwrite_zeroes(BlockDriverState *bs,
>> +                                              int64_t offset,
>> +                                              int count,
>> +                                              BdrvRequestFlags flags)
>>  {
>>      int ret;
>>      BDRVVmdkState *s = bs->opaque;
>> -    uint64_t offset = sector_num * BDRV_SECTOR_SIZE;
>> -    uint64_t bytes = nb_sectors * BDRV_SECTOR_SIZE;
>> +    uint64_t bytes = count;
> 
> That's an unnecessary variable again. Whether you decide to change it or
> not:
> 
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Unnecessary, except that it is 64-bit instead of the block layer
interface 32-bit, and I didn't want to have to think too hard about how
'bytes' was used in the rest of the function if I used the narrower type
from the get-go.  I also think that 'int count' is fishy, because it
forces us to think about negative values and placating code sanitizers
on undefined shift values; maybe we'd be better with making all byte
interfaces use 'uint32_t' (but still limiting ourselves to 0x80000000 or
2G for any power-of-two limit, and 0xffffffff size transactions would
not be possible if request_alignment is larger than 1).  If we made that
switch, I'd still want to keep 0 as a no-op transaction, and not a
special case for a 4G transaction.  Still, now might be the time to do it.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 09/13] qed: Convert to bdrv_co_pwrite_zeroes()
  2016-05-25 14:28     ` Eric Blake
@ 2016-05-25 15:06       ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2016-05-25 15:06 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 5686 bytes --]

Am 25.05.2016 um 16:28 hat Eric Blake geschrieben:
> On 05/25/2016 08:07 AM, Kevin Wolf wrote:
> > Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> >> Another step on our continuing quest to switch to byte-based
> >> interfaces.
> >>
> >> Kill an abuse of the comma operator while at it (fortunately,
> >> the semantics were still right).
> >>
> >> Signed-off-by: Eric Blake <eblake@redhat.com>
> >> ---
> >>  block/qed.c | 25 +++++++++++++------------
> >>  1 file changed, 13 insertions(+), 12 deletions(-)
> >>
> 
> >> @@ -1443,10 +1443,10 @@ static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
> >>
> >>      /* Refuse if there are untouched backing file sectors */
> 
> The comment wasn't very helpful, so I may rewort it, too
> (s/untouched/unaligned/, or something like that)

I think "unaligned" is not what the comment means. What would an
unaligned sector be anyway?

The idea is probably like in qcow2 that if there is no backing file, the
rest of the cluster already reads as zeros, so we can overwrite it.

> >>      if (bs->backing) {
> >> -        if (qed_offset_into_cluster(s, sector_num * BDRV_SECTOR_SIZE) != 0) {
> >> +        if (qed_offset_into_cluster(s, offset) != 0) {
> >>              return -ENOTSUP;
> >>          }
> >> -        if (qed_offset_into_cluster(s, nb_sectors * BDRV_SECTOR_SIZE) != 0) {
> >> +        if (qed_offset_into_cluster(s, count) != 0) {
> >>              return -ENOTSUP;
> >>          }
> >>      }
> > 
> > Unaligned requests are only emulated if there is no backing file...
> > 
> >> @@ -1454,12 +1454,13 @@ static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
> >>      /* Zero writes start without an I/O buffer.  If a buffer becomes necessary
> >>       * then it will be allocated during request processing.
> >>       */
> >> -    iov.iov_base = NULL,
> >> -    iov.iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
> >> +    iov.iov_base = NULL;
> >> +    iov.iov_len = count;
> >>
> >>      qemu_iovec_init_external(&qiov, &iov, 1);
> >> -    blockacb = qed_aio_setup(bs, sector_num, &qiov, nb_sectors,
> >> -                             qed_co_write_zeroes_cb, &cb,
> >> +    blockacb = qed_aio_setup(bs, offset >> BDRV_SECTOR_BITS, &qiov,
> >> +                             count >> BDRV_SECTOR_BITS,
> > 
> > ...so offset and count can still be unaligned here and we end up zeroing
> > out the wrong part of the sector. I guess we need to return -ENOTSUP for
> > all sub-sector requests, even without a backing file.
> 
> Hmm. Wouldn't it be nicer if we could guarantee that blk_pwrite_zeroes()
> will never call bdrv_co_pwrite_zeroes() with less than
> request_alignment?

If we want this restriction, the right place is to implement it is in
bdrv_co_pwrite_zeroes() before calling into the driver, not on the
BlockBackend level.

> That is, if the block layer takes care of
> read-modify-write for any unaligned byte offset less than
> request_alignment, then the driver only has to worry about sector
> alignment.  Except qed.c doesn't seem to set request_alignment, but is
> just relying on io.c currently setting it to MAX(BDRV_SECTOR_SIZE,
> bs->bl.request_alignment) everywhere. (And the fact that
> request_alignment is a sibling rather than a member to BlockLimits bl is
> awkward.)

As explained on IRC, the block driver shouldn't care about sector
alignment. 512 is just an arbitrary number that some interfaces
misguidedly happen to use as their unit for parameters. The QED request
struct is one of them, but there is no real reason for it. There is no
metadata that we have at a sector granularity.

So I would avoid baking assumptions of bad drivers into new interfaces.

> Maybe we want three limits in BlockLimits, rather than two: the current
> max_pwrite_zeroes does a good job at saying how small blk_pwrite_zeroes
> must fragment large requests, and pwrite_zeroes_alignment does a good
> job at saying how large a request must be to potentially punch a hole,
> but at least in the case of qcow2, where we want to optimize a partial
> write to potentially zeroing an entire cluster, we still want to limit
> things to sector boundaries when checking for whether the rest of the
> cluster already reads as zeroes, whether or not we also want to support
> request_alignment of 1 instead of 512.

For qcow2, the only reason that sectors are involved in the optimisation
is that that's the granularity of bdrv_get_block_status(). Once that is
fixed, qcow2 can use bytes for its optimisation.

Until then, if we decided that we don't want to check the full cluster
any more (clusters are always aligned to sectors) but only the area that
is not overwritten, it would have to round start and end of the request
to the sector bounary.

> There are other drivers that I touched in this series that were relying
> on the fact that the block layer currently guarantees sector alignment,
> and maybe they should be setting request_alignment, or maybe we want to
> add yet another BlockLimit member.  So even if we want normal read/write
> to allow request_alignment of 1 in the case where we don't need the
> block layer to do a read-modify-write, I'm still wondering whether we
> want the write_zeroes engine to have a different minimum alignment and
> ALWAYS hand off to normal read-modify-write for anything smaller.

Do you have an example where the restriction to full sectors is
fundamental and not just a shortcoming of the implementation? As long as
it's only the latter, I think using -ENOTSUP to deal with it until we
fix it for good is fine.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes
  2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
                   ` (13 preceding siblings ...)
  2016-05-25 11:02 ` [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Kevin Wolf
@ 2016-06-01 15:35 ` Kevin Wolf
  2016-06-01 15:38   ` Eric Blake
  14 siblings, 1 reply; 34+ messages in thread
From: Kevin Wolf @ 2016-06-01 15:35 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block

Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
> Kevin pointed out that my recent change to byte-based instead
> of sector-based blk_write_zeroes() (commit 983a1600) makes life
> harder as long as bdrv_write_zeroes is still sector-based, and
> where the compiler doesn't flag any change in parameter types.
> Complete the conversion, by renaming things (so the compiler
> will help flag any future rebase conflicts), and making all
> write_zeroes operations nominally take bytes.
> 
> Definitely conflicts with Denis' qcow2_co_write_zeroes improvements
> series, and probably with Kevin's conversion of block jobs to
> BlockBackend. I can rebase if those land on the block branch first.

Do you have an idea when you'll have the time to send a v2? It seems
that whatever I start working on at the moment, I always end up with an
dependency on this series. :-)

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes
  2016-06-01 15:35 ` Kevin Wolf
@ 2016-06-01 15:38   ` Eric Blake
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2016-06-01 15:38 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1176 bytes --]

On 06/01/2016 09:35 AM, Kevin Wolf wrote:
> Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
>> Kevin pointed out that my recent change to byte-based instead
>> of sector-based blk_write_zeroes() (commit 983a1600) makes life
>> harder as long as bdrv_write_zeroes is still sector-based, and
>> where the compiler doesn't flag any change in parameter types.
>> Complete the conversion, by renaming things (so the compiler
>> will help flag any future rebase conflicts), and making all
>> write_zeroes operations nominally take bytes.
>>
>> Definitely conflicts with Denis' qcow2_co_write_zeroes improvements
>> series, and probably with Kevin's conversion of block jobs to
>> BlockBackend. I can rebase if those land on the block branch first.
> 
> Do you have an idea when you'll have the time to send a v2? It seems
> that whatever I start working on at the moment, I always end up with an
> dependency on this series. :-)

Coincidentally enough, today :)  I'm just making one last pass through
my edits to make sure v2 is ready for posting.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 05/13] iscsi: Convert to bdrv_co_pwrite_zeroes()
  2016-05-25 13:34   ` Kevin Wolf
@ 2016-06-01 16:33     ` Eric Blake
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2016-06-01 16:33 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, Ronnie Sahlberg, Paolo Bonzini,
	Peter Lieven, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2446 bytes --]

On 05/25/2016 07:34 AM, Kevin Wolf wrote:
> Am 25.05.2016 um 00:25 hat Eric Blake geschrieben:
>> Another step on our continuing quest to switch to byte-based
>> interfaces.
>>
>> As this is the first byte-based iscsi interface, convert
>> is_request_lun_aligned() into two versions, one for sectors
>> and one for bytes.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---
>>  block/iscsi.c | 53 +++++++++++++++++++++++++++++++----------------------
>>  1 file changed, 31 insertions(+), 22 deletions(-)
>>

>> +static bool is_sector_request_lun_aligned(int64_t sector_num, int nb_sectors,
>> +                                          IscsiLun *iscsilun)
>> +{
>> +    return is_byte_request_lun_aligned(sector_num << BDRV_SECTOR_BITS,
>> +                                       nb_sectors << BDRV_SECTOR_BITS,
>> +                                       iscsilun);
>>  }
> 
> You're switching from (nb_sectors * BDRV_SECTOR_SIZE) to (nb_sectors <<
> BDRV_SECTOR_BITS). The difference is that the former is a 64 bit
> calculation because BDRV_SECTOR_BITS is unsigned long long, whereas the
> latter is a 32 bit calculation.
> 
> Fortunately, it seems to me that all input values come directly from the
> block layer which already limits requests to BDRV_REQUEST_MAX_SECTORS.
> So we should be safe from overflows here.

Still, it won't hurt to add an assert.

>> @@ -978,7 +985,7 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
>>      uint32_t nb_blocks;
>>      bool use_16_for_ws = iscsilun->use_16_for_rw;
>>
>> -    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
>> +    if (!is_byte_request_lun_aligned(offset, count, iscsilun)) {
>>          return -EINVAL;
>>      }
> 
> Should this become -ENOTSUP so that emulation can take over rather than
> failing the request?

It's still -EINVAL on unaligned write requests; then again, the block
layer guarantees that it will honor bs->request_alignment for write
requests, even on RMW for write-zeroes fallbacks.  So switching to
-ENOTSUP makes sense.

> 
> We should probably also always set bs->bl.pwrite_zeroes_alignment, with
> a fallback to iscsilun->block_size if we don't have iscsilun->lbp.lbpws.
> But that's a separate patch.

Yes, added as a separate patch.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-06-01 16:34 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-24 22:25 [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Eric Blake
2016-05-24 22:25 ` [Qemu-devel] [PATCH 01/13] block: Rename blk_write_zeroes() Eric Blake
2016-05-24 22:25 ` [Qemu-devel] [PATCH 02/13] block: Track write zero limits in bytes Eric Blake
2016-05-25 10:30   ` Kevin Wolf
2016-05-25 11:21     ` Eric Blake
2016-05-24 22:25 ` [Qemu-devel] [PATCH 03/13] block: Add .bdrv_co_pwrite_zeroes() Eric Blake
2016-05-25 13:02   ` Kevin Wolf
2016-05-24 22:25 ` [Qemu-devel] [PATCH 04/13] block: Switch bdrv_write_zeroes() to byte interface Eric Blake
2016-05-25 13:18   ` Kevin Wolf
2016-05-24 22:25 ` [Qemu-devel] [PATCH 05/13] iscsi: Convert to bdrv_co_pwrite_zeroes() Eric Blake
2016-05-25 13:34   ` Kevin Wolf
2016-06-01 16:33     ` Eric Blake
2016-05-24 22:25 ` [Qemu-devel] [PATCH 06/13] qcow2: " Eric Blake
2016-05-25 13:53   ` Kevin Wolf
2016-05-24 22:25 ` [Qemu-devel] [PATCH 07/13] blkreplay: " Eric Blake
2016-05-25 13:54   ` Kevin Wolf
2016-05-24 22:25 ` [Qemu-devel] [PATCH 08/13] gluster: " Eric Blake
2016-05-25 13:57   ` Kevin Wolf
2016-05-24 22:25 ` [Qemu-devel] [PATCH 09/13] qed: " Eric Blake
2016-05-25 14:07   ` Kevin Wolf
2016-05-25 14:28     ` Eric Blake
2016-05-25 15:06       ` Kevin Wolf
2016-05-24 22:25 ` [Qemu-devel] [PATCH 10/13] raw-posix: " Eric Blake
2016-05-25 14:20   ` Kevin Wolf
2016-05-24 22:25 ` [Qemu-devel] [PATCH 11/13] raw_bsd: " Eric Blake
2016-05-25 14:20   ` Kevin Wolf
2016-05-24 22:25 ` [Qemu-devel] [PATCH 12/13] vmdk: " Eric Blake
2016-05-25 14:23   ` Kevin Wolf
2016-05-25 14:35     ` Eric Blake
2016-05-24 22:25 ` [Qemu-devel] [PATCH 13/13] block: Kill bdrv_co_write_zeroes() Eric Blake
2016-05-25 14:24   ` Kevin Wolf
2016-05-25 11:02 ` [Qemu-devel] [PATCH 00/13] Kill sector-based write_zeroes Kevin Wolf
2016-06-01 15:35 ` Kevin Wolf
2016-06-01 15:38   ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.