All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/17] Byte-based block limits
@ 2016-06-14 21:30 Eric Blake
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev() Eric Blake
                   ` (17 more replies)
  0 siblings, 18 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf

BlockLimits is currently an ugly mix of byte limits vs.
sector limits.  Unify it.  Fix some bugs I found in
bdrv_aligned_preadv() while at it.

Prequisite: Kevin's ongoing work to migrate bdrv_aligned_preadv()
to be byte-based (commit 3de06b2 on his vmstate branch at the
time of this email, but that gets rebased):
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02832.html

Trivial contextual conflict in nbd.h with the pull request Paolo
will soon be posting (both series add a #define near the same
line; resolution is to add both):
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03333.html

Also available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-limits-v2

Since v1:
- drop things already done in Kevin's work
- rebase
- split out lots of cleanup work to bdrv_refresh_limits() so
that qemu-iotests does not gain any problems on 77

001/17:[down] 'block: Tighter assertions on bdrv_aligned_pwritev()'
002/17:[down] 'block: Document supported flags during bdrv_aligned_preadv()'
003/17:[down] 'block: Fix harmless off-by-one in bdrv_aligned_preadv()'
004/17:[down] 'nbd: Allow larger requests'
005/17:[down] 'nbd: Advertise realistic limits to block layer'
006/17:[down] 'iscsi: Advertise realistic limits to block layer'
007/17:[down] 'block: Give nonzero result to blk_get_max_transfer_length()'
008/17:[down] 'blkdebug: Set request_alignment during .bdrv_refresh_limits()'
009/17:[down] 'iscsi: Set request_alignment during .bdrv_refresh_limits()'
010/17:[down] 'qcow2: Set request_alignment during .bdrv_refresh_limits()'
011/17:[down] 'raw-win32: Set request_alignment during .bdrv_refresh_limits()'
012/17:[down] 'block: Set request_alignment during .bdrv_refresh_limits()'
013/17:[down] 'block: Set default request_alignment during bdrv_refresh_limits()'
014/17:[0061] [FC] 'block: Switch transfer length bounds to byte-based'
015/17:[0008] [FC] 'block: Switch discard length bounds to byte-based'
016/17:[down] 'block: Split bdrv_merge_limits() from bdrv_refresh_limits()'
017/17:[0044] [FC] 'block: Move request_alignment into BlockLimit'

Eric Blake (17):
  block: Tighter assertions on bdrv_aligned_pwritev()
  block: Document supported flags during bdrv_aligned_preadv()
  block: Fix harmless off-by-one in bdrv_aligned_preadv()
  nbd: Allow larger requests
  nbd: Advertise realistic limits to block layer
  iscsi: Advertise realistic limits to block layer
  block: Give nonzero result to blk_get_max_transfer_length()
  blkdebug: Set request_alignment during .bdrv_refresh_limits()
  iscsi: Set request_alignment during .bdrv_refresh_limits()
  qcow2: Set request_alignment during .bdrv_refresh_limits()
  raw-win32: Set request_alignment during .bdrv_refresh_limits()
  block: Set request_alignment during .bdrv_refresh_limits()
  block: Set default request_alignment during bdrv_refresh_limits()
  block: Switch transfer length bounds to byte-based
  block: Switch discard length bounds to byte-based
  block: Split bdrv_merge_limits() from bdrv_refresh_limits()
  block: Move request_alignment into BlockLimit

 include/block/block.h          |  1 +
 include/block/block_int.h      | 48 ++++++++++++++-------
 include/block/nbd.h            |  1 +
 include/qemu/typedefs.h        |  1 +
 include/sysemu/block-backend.h |  2 +-
 block.c                        |  3 +-
 block/blkdebug.c               | 18 ++++++--
 block/block-backend.c          |  9 ++--
 block/bochs.c                  |  7 ++-
 block/cloop.c                  |  7 ++-
 block/dmg.c                    |  7 ++-
 block/io.c                     | 96 +++++++++++++++++++++++-------------------
 block/iscsi.c                  | 40 +++++++++---------
 block/nbd-client.c             |  4 --
 block/nbd.c                    |  4 +-
 block/qcow2.c                  |  7 +--
 block/raw-posix.c              | 19 +++++----
 block/raw-win32.c              | 10 ++---
 block/raw_bsd.c                |  4 +-
 block/vvfat.c                  |  7 ++-
 hw/block/virtio-blk.c          | 10 ++---
 hw/scsi/scsi-generic.c         | 15 ++++---
 qemu-img.c                     |  9 ++--
 23 files changed, 195 insertions(+), 134 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:05   ` Fam Zheng
  2016-06-20 12:09   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv() Eric Blake
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Stefan Hajnoczi, Fam Zheng, Max Reitz

For symmetry with bdrv_aligned_preadv(), assert that the caller
really has aligned things properly. This requires adding an align
parameter, which is used now only in the new asserts, but will
come in handy in a later patch that adds auto-fragmentation to the
max transfer size, since that value need not always be a multiple
of the alignment, and therefore must be rounded down.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/io.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index 42e63f7..056a101 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1242,7 +1242,7 @@ fail:
  */
 static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
     BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
-    QEMUIOVector *qiov, int flags)
+    int64_t align, QEMUIOVector *qiov, int flags)
 {
     BlockDriver *drv = bs->drv;
     bool waited;
@@ -1251,6 +1251,9 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
     int64_t start_sector = offset >> BDRV_SECTOR_BITS;
     int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);

+    assert(is_power_of_2(align));
+    assert((offset & (align - 1)) == 0);
+    assert((bytes & (align - 1)) == 0);
     assert(!qiov || bytes == qiov->size);
     assert((bs->open_flags & BDRV_O_NO_IO) == 0);
     assert(!(flags & ~BDRV_REQ_MASK));
@@ -1337,7 +1340,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BlockDriverState *bs,

         memset(buf + head_padding_bytes, 0, zero_bytes);
         ret = bdrv_aligned_pwritev(bs, req, offset & ~(align - 1), align,
-                                   &local_qiov,
+                                   align, &local_qiov,
                                    flags & ~BDRV_REQ_ZERO_WRITE);
         if (ret < 0) {
             goto fail;
@@ -1350,7 +1353,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BlockDriverState *bs,
     if (bytes >= align) {
         /* Write the aligned part in the middle. */
         uint64_t aligned_bytes = bytes & ~(align - 1);
-        ret = bdrv_aligned_pwritev(bs, req, offset, aligned_bytes,
+        ret = bdrv_aligned_pwritev(bs, req, offset, aligned_bytes, align,
                                    NULL, flags);
         if (ret < 0) {
             goto fail;
@@ -1374,7 +1377,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BlockDriverState *bs,
         bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_AFTER_TAIL);

         memset(buf, 0, bytes);
-        ret = bdrv_aligned_pwritev(bs, req, offset, align,
+        ret = bdrv_aligned_pwritev(bs, req, offset, align, align,
                                    &local_qiov, flags & ~BDRV_REQ_ZERO_WRITE);
     }
 fail:
@@ -1499,7 +1502,7 @@ int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs,
         bytes = ROUND_UP(bytes, align);
     }

-    ret = bdrv_aligned_pwritev(bs, &req, offset, bytes,
+    ret = bdrv_aligned_pwritev(bs, &req, offset, bytes, align,
                                use_local_qiov ? &local_qiov : qiov,
                                flags);

-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev() Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:08   ` Fam Zheng
  2016-06-20 12:10   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv() Eric Blake
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Stefan Hajnoczi, Fam Zheng, Max Reitz

We don't pass any flags on to drivers to handle.  Tighten an
assert to explain why we pass 0 to bdrv_driver_preadv(), and add
some comments on things to be aware of if we want to turn on
per-BDS BDRV_REQ_FUA support during reads in the future.  Also,
document that we may want to consider using unmap during
copy-on-read operations where the read is all zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: retitle from 'Honor flags during bdrv_aligned_preadv()', and
change scope of patch
---
 block/io.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 056a101..9191772 100644
--- a/block/io.c
+++ b/block/io.c
@@ -933,6 +933,9 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,

     if (drv->bdrv_co_pwrite_zeroes &&
         buffer_is_zero(bounce_buffer, iov.iov_len)) {
+        /* FIXME: Should we (perhaps conditionally) be setting
+         * BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy
+         * that still correctly reads as zero? */
         ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, cluster_bytes, 0);
     } else {
         /* This does not change the data on the disk, it is not necessary
@@ -975,7 +978,12 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
     assert((bytes & (align - 1)) == 0);
     assert(!qiov || bytes == qiov->size);
     assert((bs->open_flags & BDRV_O_NO_IO) == 0);
-    assert(!(flags & ~BDRV_REQ_MASK));
+
+    /* TODO: We would need a per-BDS .supported_read_flags and
+     * potential fallback support, if we ever implement any read flags
+     * to pass through to drivers.  For now, there aren't any
+     * passthrough flags.  */
+    assert(!(flags & ~(BDRV_REQ_NO_SERIALISING | BDRV_REQ_COPY_ON_READ)));

     /* Handle Copy on Read and associated serialisation */
     if (flags & BDRV_REQ_COPY_ON_READ) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev() Eric Blake
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv() Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:10   ` Fam Zheng
  2016-06-20 12:12   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests Eric Blake
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Stefan Hajnoczi, Fam Zheng, Max Reitz

If the amount of data to read ends exactly on the total size
of the bs, then we were wasting time creating a local qiov
to read the data in preparation for what would normally be
appending zeroes beyond the end, even though this corner case
has nothing further to do.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

---
v2: no change, add R-b
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 9191772..383154f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1024,7 +1024,7 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
     }

     max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
-    if (bytes < max_bytes) {
+    if (bytes <= max_bytes) {
         ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
     } else if (max_bytes > 0) {
         QEMUIOVector local_qiov;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (2 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv() Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-15 13:37   ` Paolo Bonzini
                     ` (2 more replies)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer Eric Blake
                   ` (13 subsequent siblings)
  17 siblings, 3 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Paolo Bonzini, Max Reitz

The NBD layer was breaking up request at a limit of 2040 sectors
(just under 1M) to cater to old qemu-nbd. But the server limit
was raised to 32M in commit 2d8214885 to match the kernel, more
than three years ago; and the upstream NBD Protocol is proposing
documentation that without any explicit communication to state
otherwise, a client should be able to safely assume that a 32M
transaction will work.  It is time to rely on the larger sizing,
and any downstream distro that cares about maximum
interoperability to older qemu-nbd servers can just tweak the
value of #define NBD_MAX_SECTORS.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 include/block/nbd.h | 1 +
 block/nbd-client.c  | 4 ----
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index b86a976..36dde24 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -76,6 +76,7 @@ enum {

 /* Maximum size of a single READ/WRITE data buffer */
 #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
+#define NBD_MAX_SECTORS (NBD_MAX_BUFFER_SIZE / BDRV_SECTOR_SIZE)

 ssize_t nbd_wr_syncv(QIOChannel *ioc,
                      struct iovec *iov,
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 4d13444..420bce8 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -269,10 +269,6 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
     return -reply.error;
 }

-/* qemu-nbd has a limit of slightly less than 1M per request.  Try to
- * remain aligned to 4K. */
-#define NBD_MAX_SECTORS 2040
-
 int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
                         int nb_sectors, QEMUIOVector *qiov)
 {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (3 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-15 13:38   ` Paolo Bonzini
                     ` (2 more replies)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 06/17] iscsi: " Eric Blake
                   ` (12 subsequent siblings)
  17 siblings, 3 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Paolo Bonzini, Max Reitz

We were basing the advertisement of maximum discard and transfer
length off of UINT32_MAX, but since the rest of the block layer
has signed int limits on a transaction, nothing could ever reach
that maximum, and we risk overflowing an int once things are
converted to byte-based rather than sector-based limits.  What's
more, we DO have a much smaller limit: both the current kernel
and qemu-nbd have a hard limit of 32M on a read or write
transaction, and while they may also permit up to a full 32 bits
on a discard transaction, the upstream NBD protocol is proposing
wording that without any explicit advertisement otherwise,
clients should limit ALL requests to the same limits as read and
write, even though the other requests do not actually require as
many bytes across the wire.  So the better limit to tell the
block layer is 32M for both values.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/nbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 6015e8b..bf67c8a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -362,8 +362,8 @@ static int nbd_co_flush(BlockDriverState *bs)

 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-    bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
-    bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
+    bs->bl.max_discard = NBD_MAX_SECTORS;
+    bs->bl.max_transfer_length = NBD_MAX_SECTORS;
 }

 static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 06/17] iscsi: Advertise realistic limits to block layer
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (4 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:19   ` Fam Zheng
  2016-06-20 12:16   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 07/17] block: Give nonzero result to blk_get_max_transfer_length() Eric Blake
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Ronnie Sahlberg, Paolo Bonzini, Peter Lieven,
	Max Reitz

The function sector_limits_lun2qemu() returns a value in units of
the block layer's 512-byte sector, and can be as large as
0x40000000, which is much larger than the block layer's inherent
limit of BDRV_REQUEST_MAX_SECTORS.  The block layer already
handles '0' as a synonym to the inherent limit, and it is nicer
to return this value than it is to calculate an arbitrary
maximum, for two reasons: we want to ensure that the block layer
continues to special-case '0' as 'no limit beyond the inherent
limits'; and we want to be able to someday expand the block
layer to allow 64-bit limits, where auditing for uses of
BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
artificially constraining iscsi to old block layer limits.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/iscsi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 7e78ade..4290e41 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1697,7 +1697,9 @@ static void iscsi_close(BlockDriverState *bs)

 static int sector_limits_lun2qemu(int64_t sector, IscsiLun *iscsilun)
 {
-    return MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
+    int limit = MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
+
+    return limit < BDRV_REQUEST_MAX_SECTORS ? limit : 0;
 }

 static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 07/17] block: Give nonzero result to blk_get_max_transfer_length()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (5 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 06/17] iscsi: " Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:25   ` Fam Zheng
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits() Eric Blake
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Max Reitz, Stefan Hajnoczi,
	Michael S. Tsirkin, Paolo Bonzini

Making all callers special-case 0 as unlimited is awkward,
and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given
our current block layer API limits.

In the case of scsi, this means that we now always advertise a
limit to the guest, even in cases where the underlying layers
previously use 0 for no inherent limit beyond the block layer.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/block-backend.c  |  7 ++++---
 hw/block/virtio-blk.c  |  3 +--
 hw/scsi/scsi-generic.c | 12 ++++++------
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 34500e6..1fb070b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1303,15 +1303,16 @@ int blk_get_flags(BlockBackend *blk)
     }
 }

+/* Returns the maximum transfer length, in sectors; guaranteed nonzero */
 int blk_get_max_transfer_length(BlockBackend *blk)
 {
     BlockDriverState *bs = blk_bs(blk);
+    int max = 0;

     if (bs) {
-        return bs->bl.max_transfer_length;
-    } else {
-        return 0;
+        max = bs->bl.max_transfer_length;
     }
+    return MIN_NON_ZERO(max, BDRV_REQUEST_MAX_SECTORS);
 }

 int blk_get_max_iov(BlockBackend *blk)
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 284e646..1d2792e 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -382,7 +382,7 @@ static int multireq_compare(const void *a, const void *b)
 void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
 {
     int i = 0, start = 0, num_reqs = 0, niov = 0, nb_sectors = 0;
-    int max_xfer_len = 0;
+    int max_xfer_len;
     int64_t sector_num = 0;

     if (mrb->num_reqs == 1) {
@@ -392,7 +392,6 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
     }

     max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk);
-    max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS);

     qsort(mrb->reqs, mrb->num_reqs, sizeof(*mrb->reqs),
           &multireq_compare);
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 71372a8..db04a76 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -226,12 +226,12 @@ static void scsi_read_complete(void * opaque, int ret)
         r->req.cmd.buf[0] == INQUIRY &&
         r->req.cmd.buf[2] == 0xb0) {
         uint32_t max_xfer_len = blk_get_max_transfer_length(s->conf.blk);
-        if (max_xfer_len) {
-            stl_be_p(&r->buf[8], max_xfer_len);
-            /* Also take care of the opt xfer len. */
-            if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
-                stl_be_p(&r->buf[12], max_xfer_len);
-            }
+
+        assert(max_xfer_len);
+        stl_be_p(&r->buf[8], max_xfer_len);
+        /* Also take care of the opt xfer len. */
+        if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
+            stl_be_p(&r->buf[12], max_xfer_len);
         }
     }
     scsi_req_data(&r->req, len);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (6 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 07/17] block: Give nonzero result to blk_get_max_transfer_length() Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:27   ` Fam Zheng
  2016-06-21 13:27   ` Kevin Wolf
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 09/17] iscsi: " Eric Blake
                   ` (9 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Max Reitz

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

qemu-iotests 77 is particularly sensitive to the fact that we
can specify an artificial alignment override in blkdebug, and
that override must continue to work even when limits are
refreshed on an already open device.

A later patch will be altering when bs->request_alignment
defaults are set, so fall back to sector alignment if we have
not inherited anything yet.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/blkdebug.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 20d25bd..1589fa7 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -37,6 +37,7 @@
 typedef struct BDRVBlkdebugState {
     int state;
     int new_state;
+    int align;

     QLIST_HEAD(, BlkdebugRule) rules[BLKDBG__MAX];
     QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
@@ -382,9 +383,10 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     }

     /* Set request alignment */
-    align = qemu_opt_get_size(opts, "align", bs->request_alignment);
-    if (align > 0 && align < INT_MAX && !(align & (align - 1))) {
-        bs->request_alignment = align;
+    align = qemu_opt_get_size(opts, "align",
+                              bs->request_alignment ?: BDRV_SECTOR_SIZE);
+    if (align > 0 && align < INT_MAX && is_power_of_2(align)) {
+        s->align = align;
     } else {
         error_setg(errp, "Invalid alignment");
         ret = -EINVAL;
@@ -720,6 +722,15 @@ static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
     bs->full_open_options = opts;
 }

+static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+    BDRVBlkdebugState *s = bs->opaque;
+
+    if (s->align) {
+        bs->request_alignment = s->align;
+    }
+}
+
 static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state,
                                    BlockReopenQueue *queue, Error **errp)
 {
@@ -738,6 +749,7 @@ static BlockDriver bdrv_blkdebug = {
     .bdrv_getlength         = blkdebug_getlength,
     .bdrv_truncate          = blkdebug_truncate,
     .bdrv_refresh_filename  = blkdebug_refresh_filename,
+    .bdrv_refresh_limits    = blkdebug_refresh_limits,

     .bdrv_aio_readv         = blkdebug_aio_readv,
     .bdrv_aio_writev        = blkdebug_aio_writev,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 09/17] iscsi: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (7 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits() Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:28   ` Fam Zheng
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 10/17] qcow2: " Eric Blake
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Ronnie Sahlberg, Paolo Bonzini, Peter Lieven,
	Max Reitz

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/iscsi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 4290e41..b4661c9 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1588,7 +1588,6 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
         goto out;
     }
     bs->total_sectors = sector_lun2qemu(iscsilun->num_blocks, iscsilun);
-    bs->request_alignment = iscsilun->block_size;

     /* We don't have any emulation for devices other than disks and CD-ROMs, so
      * this must be sg ioctl compatible. We force it to be sg, otherwise qemu
@@ -1710,6 +1709,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
     IscsiLun *iscsilun = bs->opaque;
     uint32_t max_xfer_len = iscsilun->use_16_for_rw ? 0xffffffff : 0xffff;

+    bs->request_alignment = iscsilun->block_size;
+
     if (iscsilun->bl.max_xfer_len) {
         max_xfer_len = MIN(max_xfer_len, iscsilun->bl.max_xfer_len);
     }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 10/17] qcow2: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (8 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 09/17] iscsi: " Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:28   ` Fam Zheng
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 11/17] raw-win32: " Eric Blake
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Max Reitz

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/qcow2.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c40baca..393afa9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -975,9 +975,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         }

         bs->encrypted = 1;
-
-        /* Encryption works on a sector granularity */
-        bs->request_alignment = BDRV_SECTOR_SIZE;
     }

     s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
@@ -1196,6 +1193,10 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVQcow2State *s = bs->opaque;

+    if (bs->encrypted) {
+        /* Encryption works on a sector granularity */
+        bs->request_alignment = BDRV_SECTOR_SIZE;
+    }
     bs->bl.pwrite_zeroes_alignment = s->cluster_size;
 }

-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 11/17] raw-win32: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (9 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 10/17] qcow2: " Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:30   ` Fam Zheng
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 12/17] block: " Eric Blake
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Max Reitz

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

In this case, raw_probe_alignment() already did what we needed,
so just fix its signature and wire it in correctly.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/raw-win32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/raw-win32.c b/block/raw-win32.c
index fd23891..88382d9 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -222,7 +222,7 @@ static void raw_attach_aio_context(BlockDriverState *bs,
     }
 }

-static void raw_probe_alignment(BlockDriverState *bs)
+static void raw_probe_alignment(BlockDriverState *bs, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
     DWORD sectorsPerCluster, freeClusters, totalClusters, count;
@@ -365,7 +365,6 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
         win32_aio_attach_aio_context(s->aio, bdrv_get_aio_context(bs));
     }

-    raw_probe_alignment(bs);
     ret = 0;
 fail:
     qemu_opts_del(opts);
@@ -550,6 +549,7 @@ BlockDriver bdrv_file = {
     .bdrv_needs_filename = true,
     .bdrv_parse_filename = raw_parse_filename,
     .bdrv_file_open     = raw_open,
+    .bdrv_refresh_limits = raw_probe_alignment,
     .bdrv_close         = raw_close,
     .bdrv_create        = raw_create,
     .bdrv_has_zero_init = bdrv_has_zero_init_1,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 12/17] block: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (10 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 11/17] raw-win32: " Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:31   ` Fam Zheng
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 13/17] block: Set default request_alignment during bdrv_refresh_limits() Eric Blake
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Stefan Hajnoczi, Max Reitz

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Add a .bdrv_refresh_limits() to all four of our legacy devices
that will always be sector-only (bochs, cloop, dmg, vvfat), in
spite of their recent conversion to expose a byte interface.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/bochs.c | 7 ++++++-
 block/cloop.c | 7 ++++++-
 block/dmg.c   | 7 ++++++-
 block/vvfat.c | 7 ++++++-
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/block/bochs.c b/block/bochs.c
index 6c8d0f3..182c50b 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -105,7 +105,6 @@ static int bochs_open(BlockDriverState *bs, QDict *options, int flags,
     int ret;

     bs->read_only = 1; // no write support yet
-    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */

     ret = bdrv_pread(bs->file->bs, 0, &bochs, sizeof(bochs));
     if (ret < 0) {
@@ -189,6 +188,11 @@ fail:
     return ret;
 }

+static void bochs_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
 {
     BDRVBochsState *s = bs->opaque;
@@ -283,6 +287,7 @@ static BlockDriver bdrv_bochs = {
     .instance_size	= sizeof(BDRVBochsState),
     .bdrv_probe		= bochs_probe,
     .bdrv_open		= bochs_open,
+    .bdrv_refresh_limits = bochs_refresh_limits,
     .bdrv_co_preadv = bochs_co_preadv,
     .bdrv_close		= bochs_close,
 };
diff --git a/block/cloop.c b/block/cloop.c
index ea5a92b..d574003 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -67,7 +67,6 @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
     int ret;

     bs->read_only = 1;
-    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */

     /* read header */
     ret = bdrv_pread(bs->file->bs, 128, &s->block_size, 4);
@@ -199,6 +198,11 @@ fail:
     return ret;
 }

+static void cloop_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static inline int cloop_read_block(BlockDriverState *bs, int block_num)
 {
     BDRVCloopState *s = bs->opaque;
@@ -280,6 +284,7 @@ static BlockDriver bdrv_cloop = {
     .instance_size  = sizeof(BDRVCloopState),
     .bdrv_probe     = cloop_probe,
     .bdrv_open      = cloop_open,
+    .bdrv_refresh_limits = cloop_refresh_limits,
     .bdrv_co_preadv = cloop_co_preadv,
     .bdrv_close     = cloop_close,
 };
diff --git a/block/dmg.c b/block/dmg.c
index 06eb513..1e53cd8 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -439,7 +439,6 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
     int ret;

     bs->read_only = 1;
-    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */

     s->n_chunks = 0;
     s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
@@ -547,6 +546,11 @@ fail:
     return ret;
 }

+static void dmg_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static inline int is_sector_in_chunk(BDRVDMGState* s,
                 uint32_t chunk_num, uint64_t sector_num)
 {
@@ -720,6 +724,7 @@ static BlockDriver bdrv_dmg = {
     .instance_size  = sizeof(BDRVDMGState),
     .bdrv_probe     = dmg_probe,
     .bdrv_open      = dmg_open,
+    .bdrv_refresh_limits = dmg_refresh_limits,
     .bdrv_co_preadv = dmg_co_preadv,
     .bdrv_close     = dmg_close,
 };
diff --git a/block/vvfat.c b/block/vvfat.c
index 6d2e21c..08b1aa3 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1180,7 +1180,6 @@ static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
         bs->read_only = 0;
     }

-    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
     bs->total_sectors = cyls * heads * secs;

     if (init_directories(s, dirname, heads, secs, errp)) {
@@ -1212,6 +1211,11 @@ fail:
     return ret;
 }

+static void vvfat_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static inline void vvfat_close_current_file(BDRVVVFATState *s)
 {
     if(s->current_mapping) {
@@ -3049,6 +3053,7 @@ static BlockDriver bdrv_vvfat = {

     .bdrv_parse_filename    = vvfat_parse_filename,
     .bdrv_file_open         = vvfat_open,
+    .bdrv_refresh_limits    = vvfat_refresh_limits,
     .bdrv_close             = vvfat_close,

     .bdrv_co_preadv         = vvfat_co_preadv,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 13/17] block: Set default request_alignment during bdrv_refresh_limits()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (11 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 12/17] block: " Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:32   ` Fam Zheng
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based Eric Blake
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Max Reitz, Stefan Hajnoczi, Fam Zheng

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Now that all drivers have been updated to supply an override
of request_alignment during their .bdrv_refresh_limits(), as
needed, the block layer itself can defer setting the default
alignment until part of the overall bdrv_refresh_limits().

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block.c    | 1 -
 block/io.c | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index b350794..61fe1df 100644
--- a/block.c
+++ b/block.c
@@ -937,7 +937,6 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
         goto fail_opts;
     }

-    bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
     bs->read_only = !(bs->open_flags & BDRV_O_RDWR);

     if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
diff --git a/block/io.c b/block/io.c
index 383154f..c425ce8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -78,6 +78,9 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
         return;
     }

+    /* Default alignment based on whether driver has byte interface */
+    bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
+
     /* Take some limits from the children as a default */
     if (bs->file) {
         bdrv_refresh_limits(bs->file->bs, &local_err);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (12 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 13/17] block: Set default request_alignment during bdrv_refresh_limits() Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:42   ` Fam Zheng
  2016-06-21 13:50   ` Kevin Wolf
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 15/17] block: Switch discard " Eric Blake
                   ` (3 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Max Reitz, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven, Michael S. Tsirkin

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length.  Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation.  Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: Hoist unrelated cleanups earlier, use name 'max_transfer' in more
places, tweak iscsi calculations
---
 include/block/block_int.h      | 11 +++++++----
 include/sysemu/block-backend.h |  2 +-
 block/block-backend.c          | 10 +++++-----
 block/io.c                     | 23 +++++++++++------------
 block/iscsi.c                  | 20 ++++++++++++--------
 block/nbd.c                    |  2 +-
 block/raw-posix.c              |  3 ++-
 hw/block/virtio-blk.c          |  9 +++++----
 hw/scsi/scsi-generic.c         | 11 ++++++-----
 qemu-img.c                     |  8 ++++----
 10 files changed, 54 insertions(+), 45 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 16c43e2..2b45d57 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -338,11 +338,14 @@ typedef struct BlockLimits {
      * power of 2, and less than max_pwrite_zeroes if that is set */
     uint32_t pwrite_zeroes_alignment;

-    /* optimal transfer length in sectors */
-    int opt_transfer_length;
+    /* optimal transfer length in bytes (must be power of 2, and
+     * multiple of bs->request_alignment), or 0 if no preferred size */
+    uint32_t opt_transfer;

-    /* maximal transfer length in sectors */
-    int max_transfer_length;
+    /* maximal transfer length in bytes (need not be power of 2, but
+     * should be multiple of opt_transfer), or 0 for no 32-bit limit.
+     * For now, anything larger than INT_MAX is clamped down. */
+    uint32_t max_transfer;

     /* memory alignment so that no bounce buffer is needed */
     size_t min_mem_alignment;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index c04af8e..2469a1c 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -170,7 +170,7 @@ bool blk_is_available(BlockBackend *blk);
 void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
-int blk_get_max_transfer_length(BlockBackend *blk);
+uint32_t blk_get_max_transfer(BlockBackend *blk);
 int blk_get_max_iov(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_try_blockalign(BlockBackend *blk, size_t size);
diff --git a/block/block-backend.c b/block/block-backend.c
index 1fb070b..e042544 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1303,16 +1303,16 @@ int blk_get_flags(BlockBackend *blk)
     }
 }

-/* Returns the maximum transfer length, in sectors; guaranteed nonzero */
-int blk_get_max_transfer_length(BlockBackend *blk)
+/* Returns the maximum transfer length, in bytes; guaranteed nonzero */
+uint32_t blk_get_max_transfer(BlockBackend *blk)
 {
     BlockDriverState *bs = blk_bs(blk);
-    int max = 0;
+    uint32_t max = 0;

     if (bs) {
-        max = bs->bl.max_transfer_length;
+        max = bs->bl.max_transfer;
     }
-    return MIN_NON_ZERO(max, BDRV_REQUEST_MAX_SECTORS);
+    return MIN_NON_ZERO(max, INT_MAX);
 }

 int blk_get_max_iov(BlockBackend *blk)
diff --git a/block/io.c b/block/io.c
index c425ce8..5e38ab4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -88,8 +88,8 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
             error_propagate(errp, local_err);
             return;
         }
-        bs->bl.opt_transfer_length = bs->file->bs->bl.opt_transfer_length;
-        bs->bl.max_transfer_length = bs->file->bs->bl.max_transfer_length;
+        bs->bl.opt_transfer = bs->file->bs->bl.opt_transfer;
+        bs->bl.max_transfer = bs->file->bs->bl.max_transfer;
         bs->bl.min_mem_alignment = bs->file->bs->bl.min_mem_alignment;
         bs->bl.opt_mem_alignment = bs->file->bs->bl.opt_mem_alignment;
         bs->bl.max_iov = bs->file->bs->bl.max_iov;
@@ -107,12 +107,10 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
             error_propagate(errp, local_err);
             return;
         }
-        bs->bl.opt_transfer_length =
-            MAX(bs->bl.opt_transfer_length,
-                bs->backing->bs->bl.opt_transfer_length);
-        bs->bl.max_transfer_length =
-            MIN_NON_ZERO(bs->bl.max_transfer_length,
-                         bs->backing->bs->bl.max_transfer_length);
+        bs->bl.opt_transfer = MAX(bs->bl.opt_transfer,
+                                  bs->backing->bs->bl.opt_transfer);
+        bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
+                                           bs->backing->bs->bl.max_transfer);
         bs->bl.opt_mem_alignment =
             MAX(bs->bl.opt_mem_alignment,
                 bs->backing->bs->bl.opt_mem_alignment);
@@ -1144,7 +1142,8 @@ int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
     return bdrv_co_do_readv(bs, sector_num, nb_sectors, qiov, 0);
 }

-#define MAX_WRITE_ZEROES_BOUNCE_BUFFER 32768
+/* Maximum buffer for write zeroes fallback, in bytes */
+#define MAX_WRITE_ZEROES_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS)

 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int count, BdrvRequestFlags flags)
@@ -1202,7 +1201,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,

         if (ret == -ENOTSUP) {
             /* Fall back to bounce buffer if write zeroes is unsupported */
-            int max_xfer_len = MIN_NON_ZERO(bs->bl.max_transfer_length,
+            int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
                                             MAX_WRITE_ZEROES_BOUNCE_BUFFER);
             BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;

@@ -1213,7 +1212,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
                 write_flags &= ~BDRV_REQ_FUA;
                 need_flush = true;
             }
-            num = MIN(num, max_xfer_len << BDRV_SECTOR_BITS);
+            num = MIN(num, max_transfer);
             iov.iov_len = num;
             if (iov.iov_base == NULL) {
                 iov.iov_base = qemu_try_blockalign(bs, num);
@@ -1230,7 +1229,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
             /* Keep bounce buffer around if it is big enough for all
              * all future requests.
              */
-            if (num < max_xfer_len << BDRV_SECTOR_BITS) {
+            if (num < max_transfer) {
                 qemu_vfree(iov.iov_base);
                 iov.iov_base = NULL;
             }
diff --git a/block/iscsi.c b/block/iscsi.c
index b4661c9..739d5ca 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -473,9 +473,10 @@ iscsi_co_writev_flags(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
         return -EINVAL;
     }

-    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
+    if (bs->bl.max_transfer &&
+        nb_sectors << BDRV_SECTOR_BITS > bs->bl.max_transfer) {
         error_report("iSCSI Error: Write of %d sectors exceeds max_xfer_len "
-                     "of %d sectors", nb_sectors, bs->bl.max_transfer_length);
+                     "of %" PRIu32 " bytes", nb_sectors, bs->bl.max_transfer);
         return -EINVAL;
     }

@@ -650,9 +651,10 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState *bs,
         return -EINVAL;
     }

-    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
+    if (bs->bl.max_transfer &&
+        nb_sectors << BDRV_SECTOR_BITS > bs->bl.max_transfer) {
         error_report("iSCSI Error: Read of %d sectors exceeds max_xfer_len "
-                     "of %d sectors", nb_sectors, bs->bl.max_transfer_length);
+                     "of %" PRIu32 " bytes", nb_sectors, bs->bl.max_transfer);
         return -EINVAL;
     }

@@ -1707,7 +1709,7 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
      * iscsi_open(): iscsi targets don't change their limits. */

     IscsiLun *iscsilun = bs->opaque;
-    uint32_t max_xfer_len = iscsilun->use_16_for_rw ? 0xffffffff : 0xffff;
+    uint64_t max_xfer_len = iscsilun->use_16_for_rw ? 0xffffffff : 0xffff;

     bs->request_alignment = iscsilun->block_size;

@@ -1715,7 +1717,9 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
         max_xfer_len = MIN(max_xfer_len, iscsilun->bl.max_xfer_len);
     }

-    bs->bl.max_transfer_length = sector_limits_lun2qemu(max_xfer_len, iscsilun);
+    if (max_xfer_len * iscsilun->block_size < INT_MAX) {
+        bs->bl.max_transfer = max_xfer_len * iscsilun->block_size;
+    }

     if (iscsilun->lbp.lbpu) {
         if (iscsilun->bl.max_unmap < 0xffffffff) {
@@ -1738,8 +1742,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
     } else {
         bs->bl.pwrite_zeroes_alignment = iscsilun->block_size;
     }
-    bs->bl.opt_transfer_length =
-        sector_limits_lun2qemu(iscsilun->bl.opt_xfer_len, iscsilun);
+    assert(iscsilun->bl.opt_xfer_len < INT_MAX / iscsilun->block_size);
+    bs->bl.opt_transfer = iscsilun->bl.opt_xfer_len * iscsilun->block_size;
 }

 /* Note that this will not re-establish a connection with an iSCSI target - it
diff --git a/block/nbd.c b/block/nbd.c
index bf67c8a..f5511ea 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -363,7 +363,7 @@ static int nbd_co_flush(BlockDriverState *bs)
 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     bs->bl.max_discard = NBD_MAX_SECTORS;
-    bs->bl.max_transfer_length = NBD_MAX_SECTORS;
+    bs->bl.max_transfer = NBD_MAX_BUFFER_SIZE;
 }

 static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
diff --git a/block/raw-posix.c b/block/raw-posix.c
index aacf132..f2bea85 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -752,7 +752,8 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
         if (S_ISBLK(st.st_mode)) {
             int ret = hdev_get_max_transfer_length(s->fd);
             if (ret >= 0) {
-                bs->bl.max_transfer_length = ret;
+                assert(ret <= BDRV_REQUEST_MAX_SECTORS);
+                bs->bl.max_transfer = ret << BDRV_SECTOR_BITS;
             }
         }
     }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 1d2792e..5000c27 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -382,7 +382,7 @@ static int multireq_compare(const void *a, const void *b)
 void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
 {
     int i = 0, start = 0, num_reqs = 0, niov = 0, nb_sectors = 0;
-    int max_xfer_len;
+    uint32_t max_transfer;
     int64_t sector_num = 0;

     if (mrb->num_reqs == 1) {
@@ -391,7 +391,7 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
         return;
     }

-    max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk);
+    max_transfer = blk_get_max_transfer(mrb->reqs[0]->dev->blk);

     qsort(mrb->reqs, mrb->num_reqs, sizeof(*mrb->reqs),
           &multireq_compare);
@@ -407,8 +407,9 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
              */
             if (sector_num + nb_sectors != req->sector_num ||
                 niov > blk_get_max_iov(blk) - req->qiov.niov ||
-                req->qiov.size / BDRV_SECTOR_SIZE > max_xfer_len ||
-                nb_sectors > max_xfer_len - req->qiov.size / BDRV_SECTOR_SIZE) {
+                req->qiov.size > max_transfer ||
+                nb_sectors > (max_transfer -
+                              req->qiov.size) / BDRV_SECTOR_SIZE) {
                 submit_requests(blk, mrb, start, num_reqs, niov);
                 num_reqs = 0;
             }
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index db04a76..f59ed8f 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -225,13 +225,14 @@ static void scsi_read_complete(void * opaque, int ret)
     if (s->type == TYPE_DISK &&
         r->req.cmd.buf[0] == INQUIRY &&
         r->req.cmd.buf[2] == 0xb0) {
-        uint32_t max_xfer_len = blk_get_max_transfer_length(s->conf.blk);
+        uint32_t max_transfer =
+            blk_get_max_transfer(s->conf.blk) >> BDRV_SECTOR_BITS;

-        assert(max_xfer_len);
-        stl_be_p(&r->buf[8], max_xfer_len);
+        assert(max_transfer);
+        stl_be_p(&r->buf[8], max_transfer);
         /* Also take care of the opt xfer len. */
-        if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
-            stl_be_p(&r->buf[12], max_xfer_len);
+        if (ldl_be_p(&r->buf[12]) > max_transfer) {
+            stl_be_p(&r->buf[12], max_transfer);
         }
     }
     scsi_req_data(&r->req, len);
diff --git a/qemu-img.c b/qemu-img.c
index 14e2661..cf9876d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2078,13 +2078,13 @@ static int img_convert(int argc, char **argv)
     }
     out_bs = blk_bs(out_blk);

-    /* increase bufsectors from the default 4096 (2M) if opt_transfer_length
+    /* increase bufsectors from the default 4096 (2M) if opt_transfer
      * or discard_alignment of the out_bs is greater. Limit to 32768 (16MB)
      * as maximum. */
     bufsectors = MIN(32768,
-                     MAX(bufsectors, MAX(out_bs->bl.opt_transfer_length,
-                                         out_bs->bl.discard_alignment))
-                    );
+                     MAX(bufsectors,
+                         MAX(out_bs->bl.opt_transfer >> BDRV_SECTOR_BITS,
+                             out_bs->bl.discard_alignment)));

     if (skip_create) {
         int64_t output_sectors = blk_nb_sectors(out_blk);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 15/17] block: Switch discard length bounds to byte-based
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (13 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:46   ` Fam Zheng
  2016-06-21 14:05   ` Kevin Wolf
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits() Eric Blake
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Stefan Hajnoczi, Fam Zheng, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment.  Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
is no longer needed; and the BlockLimits type is now completely
byte-based.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: rebase nbd and iscsi limits across earlier improvements
---
 include/block/block_int.h | 21 +++++++++++++++------
 block/io.c                | 16 +++++++++-------
 block/iscsi.c             | 19 ++++++-------------
 block/nbd.c               |  2 +-
 qemu-img.c                |  3 ++-
 5 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2b45d57..0169019 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -324,18 +324,27 @@ struct BlockDriver {
 };

 typedef struct BlockLimits {
-    /* maximum number of sectors that can be discarded at once */
-    int max_discard;
+    /* maximum number of bytes that can be discarded at once (since it
+     * is signed, it must be < 2G, if set), should be multiple of
+     * pdiscard_alignment, but need not be power of 2. May be 0 if no
+     * inherent 32-bit limit */
+    int32_t max_pdiscard;

-    /* optimal alignment for discard requests in sectors */
-    int64_t discard_alignment;
+    /* optimal alignment for discard requests in bytes, must be power
+     * of 2, less than max_discard if that is set, and multiple of
+     * bs->request_alignment. May be 0 if bs->request_alignment is
+     * good enough */
+    uint32_t pdiscard_alignment;

     /* maximum number of bytes that can zeroized at once (since it is
-     * signed, it must be < 2G, if set) */
+     * signed, it must be < 2G, if set), should be multiple of
+     * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
     int32_t max_pwrite_zeroes;

     /* optimal alignment for write zeroes requests in bytes, must be
-     * power of 2, and less than max_pwrite_zeroes if that is set */
+     * power of 2, less than max_pwrite_zeroes if that is set, and
+     * multiple of bs->request_alignment. May be 0 if
+     * bs->request_alignment is good enough */
     uint32_t pwrite_zeroes_alignment;

     /* optimal transfer length in bytes (must be power of 2, and
diff --git a/block/io.c b/block/io.c
index 5e38ab4..0b5c40d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2352,19 +2352,21 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
                           BDRV_TRACKED_DISCARD);
     bdrv_set_dirty(bs, sector_num, nb_sectors);

-    max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
+    max_discard = MIN_NON_ZERO(bs->bl.max_pdiscard >> BDRV_SECTOR_BITS,
+                               BDRV_REQUEST_MAX_SECTORS);
     while (nb_sectors > 0) {
         int ret;
         int num = nb_sectors;
+        int discard_alignment = bs->bl.pdiscard_alignment >> BDRV_SECTOR_BITS;

         /* align request */
-        if (bs->bl.discard_alignment &&
-            num >= bs->bl.discard_alignment &&
-            sector_num % bs->bl.discard_alignment) {
-            if (num > bs->bl.discard_alignment) {
-                num = bs->bl.discard_alignment;
+        if (discard_alignment &&
+            num >= discard_alignment &&
+            sector_num % discard_alignment) {
+            if (num > discard_alignment) {
+                num = discard_alignment;
             }
-            num -= sector_num % bs->bl.discard_alignment;
+            num -= sector_num % discard_alignment;
         }

         /* limit request size */
diff --git a/block/iscsi.c b/block/iscsi.c
index 739d5ca..af9babf 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1696,13 +1696,6 @@ static void iscsi_close(BlockDriverState *bs)
     memset(iscsilun, 0, sizeof(IscsiLun));
 }

-static int sector_limits_lun2qemu(int64_t sector, IscsiLun *iscsilun)
-{
-    int limit = MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
-
-    return limit < BDRV_REQUEST_MAX_SECTORS ? limit : 0;
-}
-
 static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     /* We don't actually refresh here, but just return data queried in
@@ -1722,14 +1715,14 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
     }

     if (iscsilun->lbp.lbpu) {
-        if (iscsilun->bl.max_unmap < 0xffffffff) {
-            bs->bl.max_discard =
-                sector_limits_lun2qemu(iscsilun->bl.max_unmap, iscsilun);
+        if (iscsilun->bl.max_unmap < 0xffffffff / iscsilun->block_size) {
+            bs->bl.max_pdiscard =
+                iscsilun->bl.max_unmap * iscsilun->block_size;
         }
-        bs->bl.discard_alignment =
-            sector_limits_lun2qemu(iscsilun->bl.opt_unmap_gran, iscsilun);
+        bs->bl.pdiscard_alignment =
+            iscsilun->bl.opt_unmap_gran * iscsilun->block_size;
     } else {
-        bs->bl.discard_alignment = iscsilun->block_size >> BDRV_SECTOR_BITS;
+        bs->bl.pdiscard_alignment = iscsilun->block_size;
     }

     if (iscsilun->bl.max_ws_len < 0xffffffff / iscsilun->block_size) {
diff --git a/block/nbd.c b/block/nbd.c
index f5511ea..08e5b67 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -362,7 +362,7 @@ static int nbd_co_flush(BlockDriverState *bs)

 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-    bs->bl.max_discard = NBD_MAX_SECTORS;
+    bs->bl.max_pdiscard = NBD_MAX_BUFFER_SIZE;
     bs->bl.max_transfer = NBD_MAX_BUFFER_SIZE;
 }

diff --git a/qemu-img.c b/qemu-img.c
index cf9876d..1237d61 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2084,7 +2084,8 @@ static int img_convert(int argc, char **argv)
     bufsectors = MIN(32768,
                      MAX(bufsectors,
                          MAX(out_bs->bl.opt_transfer >> BDRV_SECTOR_BITS,
-                             out_bs->bl.discard_alignment)));
+                             out_bs->bl.pdiscard_alignment >>
+                             BDRV_SECTOR_BITS)));

     if (skip_create) {
         int64_t output_sectors = blk_nb_sectors(out_blk);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits()
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (14 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 15/17] block: Switch discard " Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:50   ` Fam Zheng
  2016-06-21 14:12   ` Kevin Wolf
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit Eric Blake
  2016-06-21 14:18 ` [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Kevin Wolf
  17 siblings, 2 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, Stefan Hajnoczi, Fam Zheng, Max Reitz

The raw block driver was blindly copying all limits from bs->file,
even though: 1. the main bdrv_refresh_limits() already does this
for many of gthe limits, and 2. blindly copying from the children
can weaken any stricter limits that were already inherited from
the backing dhain during the main bdrv_refresh_limits().  Also,
the next patch is about to move .request_alignment into
BlockLimits, and that is a limit that should NOT be copied from
other layers in the BDS chain.

Solve the issue by factoring out a new bdrv_merge_limits(),
and using that function to properly merge limits when comparing
two BlockDriverState objects.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 include/block/block.h     |  1 +
 include/block/block_int.h |  4 ++--
 include/qemu/typedefs.h   |  1 +
 block/io.c                | 31 +++++++++++++------------------
 block/raw_bsd.c           |  4 ++--
 5 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 733a8ec..c1d4648 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -262,6 +262,7 @@ int64_t bdrv_nb_sectors(BlockDriverState *bs);
 int64_t bdrv_getlength(BlockDriverState *bs);
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
+void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src);
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
 int bdrv_commit(BlockDriverState *bs);
 int bdrv_change_backing_file(BlockDriverState *bs,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 0169019..88ef826 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -323,7 +323,7 @@ struct BlockDriver {
     QLIST_ENTRY(BlockDriver) list;
 };

-typedef struct BlockLimits {
+struct BlockLimits {
     /* maximum number of bytes that can be discarded at once (since it
      * is signed, it must be < 2G, if set), should be multiple of
      * pdiscard_alignment, but need not be power of 2. May be 0 if no
@@ -364,7 +364,7 @@ typedef struct BlockLimits {

     /* maximum number of iovec elements */
     int max_iov;
-} BlockLimits;
+};

 typedef struct BdrvOpBlocker BdrvOpBlocker;

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index b113fcf..e6f72c2 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -14,6 +14,7 @@ typedef struct BdrvDirtyBitmap BdrvDirtyBitmap;
 typedef struct BlockBackend BlockBackend;
 typedef struct BlockBackendRootState BlockBackendRootState;
 typedef struct BlockDriverState BlockDriverState;
+typedef struct BlockLimits BlockLimits;
 typedef struct BusClass BusClass;
 typedef struct BusState BusState;
 typedef struct CharDriverState CharDriverState;
diff --git a/block/io.c b/block/io.c
index 0b5c40d..c6c1f7b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -67,6 +67,17 @@ static void bdrv_parent_drained_end(BlockDriverState *bs)
     }
 }

+void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
+{
+    dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
+    dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer);
+    dst->opt_mem_alignment = MAX(dst->opt_mem_alignment,
+                                 src->opt_mem_alignment);
+    dst->min_mem_alignment = MAX(dst->min_mem_alignment,
+                                 src->min_mem_alignment);
+    dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
+}
+
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
@@ -88,11 +99,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
             error_propagate(errp, local_err);
             return;
         }
-        bs->bl.opt_transfer = bs->file->bs->bl.opt_transfer;
-        bs->bl.max_transfer = bs->file->bs->bl.max_transfer;
-        bs->bl.min_mem_alignment = bs->file->bs->bl.min_mem_alignment;
-        bs->bl.opt_mem_alignment = bs->file->bs->bl.opt_mem_alignment;
-        bs->bl.max_iov = bs->file->bs->bl.max_iov;
+        bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
     } else {
         bs->bl.min_mem_alignment = 512;
         bs->bl.opt_mem_alignment = getpagesize();
@@ -107,19 +114,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
             error_propagate(errp, local_err);
             return;
         }
-        bs->bl.opt_transfer = MAX(bs->bl.opt_transfer,
-                                  bs->backing->bs->bl.opt_transfer);
-        bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
-                                           bs->backing->bs->bl.max_transfer);
-        bs->bl.opt_mem_alignment =
-            MAX(bs->bl.opt_mem_alignment,
-                bs->backing->bs->bl.opt_mem_alignment);
-        bs->bl.min_mem_alignment =
-            MAX(bs->bl.min_mem_alignment,
-                bs->backing->bs->bl.min_mem_alignment);
-        bs->bl.max_iov =
-            MIN(bs->bl.max_iov,
-                bs->backing->bs->bl.max_iov);
+        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
     }

     /* Then let the driver override it */
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index b1d5237..379ce8d 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -1,6 +1,6 @@
 /* BlockDriver implementation for "raw"
  *
- * Copyright (C) 2010, 2013, Red Hat, Inc.
+ * Copyright (C) 2010-2016 Red Hat, Inc.
  * Copyright (C) 2010, Blue Swirl <blauwirbel@gmail.com>
  * Copyright (C) 2009, Anthony Liguori <aliguori@us.ibm.com>
  *
@@ -152,7 +152,7 @@ static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)

 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-    bs->bl = bs->file->bs->bl;
+    bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
 }

 static int raw_truncate(BlockDriverState *bs, int64_t offset)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (15 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits() Eric Blake
@ 2016-06-14 21:30 ` Eric Blake
  2016-06-16  5:52   ` Fam Zheng
  2016-06-21 14:16   ` Kevin Wolf
  2016-06-21 14:18 ` [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Kevin Wolf
  17 siblings, 2 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-14 21:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, Max Reitz, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

It makes more sense to have ALL block size limit constraints
in the same struct.  Improve the documentation while at it.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: drop hacks for save/restore of alignment, now that earlier
patches handled setting up BlockLimits more sanely
---
 include/block/block_int.h | 22 +++++++++++++---------
 block.c                   |  2 +-
 block/blkdebug.c          |  4 ++--
 block/bochs.c             |  2 +-
 block/cloop.c             |  2 +-
 block/dmg.c               |  2 +-
 block/io.c                | 12 ++++++------
 block/iscsi.c             |  2 +-
 block/qcow2.c             |  2 +-
 block/raw-posix.c         | 16 ++++++++--------
 block/raw-win32.c         |  6 +++---
 block/vvfat.c             |  2 +-
 12 files changed, 39 insertions(+), 35 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 88ef826..77f47d9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -324,6 +324,12 @@ struct BlockDriver {
 };

 struct BlockLimits {
+    /* Alignment requirement, in bytes, for offset/length of I/O
+     * requests. Must be a power of 2 less than INT_MAX. A value of 0
+     * defaults to 1 for drivers with modern byte interfaces, and to
+     * 512 otherwise. */
+    uint32_t request_alignment;
+
     /* maximum number of bytes that can be discarded at once (since it
      * is signed, it must be < 2G, if set), should be multiple of
      * pdiscard_alignment, but need not be power of 2. May be 0 if no
@@ -332,8 +338,8 @@ struct BlockLimits {

     /* optimal alignment for discard requests in bytes, must be power
      * of 2, less than max_discard if that is set, and multiple of
-     * bs->request_alignment. May be 0 if bs->request_alignment is
-     * good enough */
+     * bl.request_alignment. May be 0 if bl.request_alignment is good
+     * enough */
     uint32_t pdiscard_alignment;

     /* maximum number of bytes that can zeroized at once (since it is
@@ -343,12 +349,12 @@ struct BlockLimits {

     /* optimal alignment for write zeroes requests in bytes, must be
      * power of 2, less than max_pwrite_zeroes if that is set, and
-     * multiple of bs->request_alignment. May be 0 if
-     * bs->request_alignment is good enough */
+     * multiple of bl.request_alignment. May be 0 if
+     * bl.request_alignment is good enough */
     uint32_t pwrite_zeroes_alignment;

     /* optimal transfer length in bytes (must be power of 2, and
-     * multiple of bs->request_alignment), or 0 if no preferred size */
+     * multiple of bl.request_alignment), or 0 if no preferred size */
     uint32_t opt_transfer;

     /* maximal transfer length in bytes (need not be power of 2, but
@@ -356,10 +362,10 @@ struct BlockLimits {
      * For now, anything larger than INT_MAX is clamped down. */
     uint32_t max_transfer;

-    /* memory alignment so that no bounce buffer is needed */
+    /* memory alignment, in bytes so that no bounce buffer is needed */
     size_t min_mem_alignment;

-    /* memory alignment for bounce buffer */
+    /* memory alignment, in bytes, for bounce buffer */
     size_t opt_mem_alignment;

     /* maximum number of iovec elements */
@@ -463,8 +469,6 @@ struct BlockDriverState {
     /* I/O Limits */
     BlockLimits bl;

-    /* Alignment requirement for offset/length of I/O requests */
-    unsigned int request_alignment;
     /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
     unsigned int supported_write_flags;
     /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
diff --git a/block.c b/block.c
index 61fe1df..d578df8 100644
--- a/block.c
+++ b/block.c
@@ -1016,7 +1016,7 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,

     assert(bdrv_opt_mem_align(bs) != 0);
     assert(bdrv_min_mem_align(bs) != 0);
-    assert(is_power_of_2(bs->request_alignment) || bdrv_is_sg(bs));
+    assert(is_power_of_2(bs->bl.request_alignment) || bdrv_is_sg(bs));

     qemu_opts_del(opts);
     return 0;
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 1589fa7..5e32643 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -384,7 +384,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,

     /* Set request alignment */
     align = qemu_opt_get_size(opts, "align",
-                              bs->request_alignment ?: BDRV_SECTOR_SIZE);
+                              bs->bl.request_alignment ?: BDRV_SECTOR_SIZE);
     if (align > 0 && align < INT_MAX && is_power_of_2(align)) {
         s->align = align;
     } else {
@@ -727,7 +727,7 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
     BDRVBlkdebugState *s = bs->opaque;

     if (s->align) {
-        bs->request_alignment = s->align;
+        bs->bl.request_alignment = s->align;
     }
 }

diff --git a/block/bochs.c b/block/bochs.c
index 182c50b..4194f1d 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -190,7 +190,7 @@ fail:

 static void bochs_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+    bs->bl.request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O */
 }

 static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
diff --git a/block/cloop.c b/block/cloop.c
index d574003..b5dc286 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -200,7 +200,7 @@ fail:

 static void cloop_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+    bs->bl.request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O */
 }

 static inline int cloop_read_block(BlockDriverState *bs, int block_num)
diff --git a/block/dmg.c b/block/dmg.c
index 1e53cd8..9612c21 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -548,7 +548,7 @@ fail:

 static void dmg_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+    bs->bl.request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O */
 }

 static inline int is_sector_in_chunk(BDRVDMGState* s,
diff --git a/block/io.c b/block/io.c
index c6c1f7b..a82bc94 100644
--- a/block/io.c
+++ b/block/io.c
@@ -90,7 +90,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
     }

     /* Default alignment based on whether driver has byte interface */
-    bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
+    bs->bl.request_alignment = drv->bdrv_co_preadv ? 1 : 512;

     /* Take some limits from the children as a default */
     if (bs->file) {
@@ -447,7 +447,7 @@ static int bdrv_get_cluster_size(BlockDriverState *bs)

     ret = bdrv_get_info(bs, &bdi);
     if (ret < 0 || bdi.cluster_size == 0) {
-        return bs->request_alignment;
+        return bs->bl.request_alignment;
     } else {
         return bdi.cluster_size;
     }
@@ -1056,7 +1056,7 @@ int coroutine_fn bdrv_co_preadv(BlockDriverState *bs,
     BlockDriver *drv = bs->drv;
     BdrvTrackedRequest req;

-    uint64_t align = bs->request_alignment;
+    uint64_t align = bs->bl.request_alignment;
     uint8_t *head_buf = NULL;
     uint8_t *tail_buf = NULL;
     QEMUIOVector local_qiov;
@@ -1153,7 +1153,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,

     int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
     int alignment = MAX(bs->bl.pwrite_zeroes_alignment ?: 1,
-                        bs->request_alignment);
+                        bs->bl.request_alignment);

     assert(is_power_of_2(alignment));
     head = offset & (alignment - 1);
@@ -1312,7 +1312,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BlockDriverState *bs,
     uint8_t *buf = NULL;
     QEMUIOVector local_qiov;
     struct iovec iov;
-    uint64_t align = bs->request_alignment;
+    uint64_t align = bs->bl.request_alignment;
     unsigned int head_padding_bytes, tail_padding_bytes;
     int ret = 0;

@@ -1399,7 +1399,7 @@ int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs,
     BdrvRequestFlags flags)
 {
     BdrvTrackedRequest req;
-    uint64_t align = bs->request_alignment;
+    uint64_t align = bs->bl.request_alignment;
     uint8_t *head_buf = NULL;
     uint8_t *tail_buf = NULL;
     QEMUIOVector local_qiov;
diff --git a/block/iscsi.c b/block/iscsi.c
index af9babf..16d7522 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1704,7 +1704,7 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
     IscsiLun *iscsilun = bs->opaque;
     uint64_t max_xfer_len = iscsilun->use_16_for_rw ? 0xffffffff : 0xffff;

-    bs->request_alignment = iscsilun->block_size;
+    bs->bl.request_alignment = iscsilun->block_size;

     if (iscsilun->bl.max_xfer_len) {
         max_xfer_len = MIN(max_xfer_len, iscsilun->bl.max_xfer_len);
diff --git a/block/qcow2.c b/block/qcow2.c
index 393afa9..7cb7f21 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1195,7 +1195,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)

     if (bs->encrypted) {
         /* Encryption works on a sector granularity */
-        bs->request_alignment = BDRV_SECTOR_SIZE;
+        bs->bl.request_alignment = BDRV_SECTOR_SIZE;
     }
     bs->bl.pwrite_zeroes_alignment = s->cluster_size;
 }
diff --git a/block/raw-posix.c b/block/raw-posix.c
index f2bea85..3c00221 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -302,22 +302,22 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
     /* For SCSI generic devices the alignment is not really used.
        With buffered I/O, we don't have any restrictions. */
     if (bdrv_is_sg(bs) || !s->needs_alignment) {
-        bs->request_alignment = 1;
+        bs->bl.request_alignment = 1;
         s->buf_align = 1;
         return;
     }

-    bs->request_alignment = 0;
+    bs->bl.request_alignment = 0;
     s->buf_align = 0;
     /* Let's try to use the logical blocksize for the alignment. */
-    if (probe_logical_blocksize(fd, &bs->request_alignment) < 0) {
-        bs->request_alignment = 0;
+    if (probe_logical_blocksize(fd, &bs->bl.request_alignment) < 0) {
+        bs->bl.request_alignment = 0;
     }
 #ifdef CONFIG_XFS
     if (s->is_xfs) {
         struct dioattr da;
         if (xfsctl(NULL, fd, XFS_IOC_DIOINFO, &da) >= 0) {
-            bs->request_alignment = da.d_miniosz;
+            bs->bl.request_alignment = da.d_miniosz;
             /* The kernel returns wrong information for d_mem */
             /* s->buf_align = da.d_mem; */
         }
@@ -337,19 +337,19 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
         qemu_vfree(buf);
     }

-    if (!bs->request_alignment) {
+    if (!bs->bl.request_alignment) {
         size_t align;
         buf = qemu_memalign(s->buf_align, max_align);
         for (align = 512; align <= max_align; align <<= 1) {
             if (raw_is_io_aligned(fd, buf, align)) {
-                bs->request_alignment = align;
+                bs->bl.request_alignment = align;
                 break;
             }
         }
         qemu_vfree(buf);
     }

-    if (!s->buf_align || !bs->request_alignment) {
+    if (!s->buf_align || !bs->bl.request_alignment) {
         error_setg(errp, "Could not find working O_DIRECT alignment. "
                          "Try cache.direct=off.");
     }
diff --git a/block/raw-win32.c b/block/raw-win32.c
index 88382d9..62edb1a 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -230,14 +230,14 @@ static void raw_probe_alignment(BlockDriverState *bs, Error **errp)
     BOOL status;

     if (s->type == FTYPE_CD) {
-        bs->request_alignment = 2048;
+        bs->bl.request_alignment = 2048;
         return;
     }
     if (s->type == FTYPE_HARDDISK) {
         status = DeviceIoControl(s->hfile, IOCTL_DISK_GET_DRIVE_GEOMETRY_EX,
                                  NULL, 0, &dg, sizeof(dg), &count, NULL);
         if (status != 0) {
-            bs->request_alignment = dg.Geometry.BytesPerSector;
+            bs->bl.request_alignment = dg.Geometry.BytesPerSector;
             return;
         }
         /* try GetDiskFreeSpace too */
@@ -247,7 +247,7 @@ static void raw_probe_alignment(BlockDriverState *bs, Error **errp)
         GetDiskFreeSpace(s->drive_path, &sectorsPerCluster,
                          &dg.Geometry.BytesPerSector,
                          &freeClusters, &totalClusters);
-        bs->request_alignment = dg.Geometry.BytesPerSector;
+        bs->bl.request_alignment = dg.Geometry.BytesPerSector;
     }
 }

diff --git a/block/vvfat.c b/block/vvfat.c
index 08b1aa3..8fa8023 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1213,7 +1213,7 @@ fail:

 static void vvfat_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-    bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+    bs->bl.request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O */
 }

 static inline void vvfat_close_current_file(BDRVVVFATState *s)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests Eric Blake
@ 2016-06-15 13:37   ` Paolo Bonzini
  2016-06-16  5:12   ` Fam Zheng
  2016-06-20 12:13   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2 siblings, 0 replies; 63+ messages in thread
From: Paolo Bonzini @ 2016-06-15 13:37 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf, Max Reitz



On 14/06/2016 23:30, Eric Blake wrote:
> The NBD layer was breaking up request at a limit of 2040 sectors
> (just under 1M) to cater to old qemu-nbd. But the server limit
> was raised to 32M in commit 2d8214885 to match the kernel, more
> than three years ago; and the upstream NBD Protocol is proposing
> documentation that without any explicit communication to state
> otherwise, a client should be able to safely assume that a 32M
> transaction will work.  It is time to rely on the larger sizing,
> and any downstream distro that cares about maximum
> interoperability to older qemu-nbd servers can just tweak the
> value of #define NBD_MAX_SECTORS.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: new patch
> ---
>  include/block/nbd.h | 1 +
>  block/nbd-client.c  | 4 ----
>  2 files changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index b86a976..36dde24 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -76,6 +76,7 @@ enum {
> 
>  /* Maximum size of a single READ/WRITE data buffer */
>  #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
> +#define NBD_MAX_SECTORS (NBD_MAX_BUFFER_SIZE / BDRV_SECTOR_SIZE)
> 
>  ssize_t nbd_wr_syncv(QIOChannel *ioc,
>                       struct iovec *iov,
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 4d13444..420bce8 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -269,10 +269,6 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
>      return -reply.error;
>  }
> 
> -/* qemu-nbd has a limit of slightly less than 1M per request.  Try to
> - * remain aligned to 4K. */
> -#define NBD_MAX_SECTORS 2040
> -
>  int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
>                          int nb_sectors, QEMUIOVector *qiov)
>  {
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer Eric Blake
@ 2016-06-15 13:38   ` Paolo Bonzini
  2016-06-22 20:58     ` Eric Blake
  2016-06-16  5:13   ` Fam Zheng
  2016-06-20 12:14   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2 siblings, 1 reply; 63+ messages in thread
From: Paolo Bonzini @ 2016-06-15 13:38 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf, qemu-stable, Max Reitz



On 14/06/2016 23:30, Eric Blake wrote:
> We were basing the advertisement of maximum discard and transfer
> length off of UINT32_MAX, but since the rest of the block layer
> has signed int limits on a transaction, nothing could ever reach
> that maximum, and we risk overflowing an int once things are
> converted to byte-based rather than sector-based limits.  What's
> more, we DO have a much smaller limit: both the current kernel
> and qemu-nbd have a hard limit of 32M on a read or write
> transaction, and while they may also permit up to a full 32 bits
> on a discard transaction, the upstream NBD protocol is proposing
> wording that without any explicit advertisement otherwise,
> clients should limit ALL requests to the same limits as read and
> write, even though the other requests do not actually require as
> many bytes across the wire.  So the better limit to tell the
> block layer is 32M for both values.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: new patch
> ---
>  block/nbd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 6015e8b..bf67c8a 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -362,8 +362,8 @@ static int nbd_co_flush(BlockDriverState *bs)
> 
>  static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
> -    bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
> -    bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
> +    bs->bl.max_discard = NBD_MAX_SECTORS;
> +    bs->bl.max_transfer_length = NBD_MAX_SECTORS;
>  }
> 
>  static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev() Eric Blake
@ 2016-06-16  5:05   ` Fam Zheng
  2016-06-20 12:09   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:05 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf, Stefan Hajnoczi, Max Reitz

On Tue, 06/14 15:30, Eric Blake wrote:
> For symmetry with bdrv_aligned_preadv(), assert that the caller
> really has aligned things properly. This requires adding an align
> parameter, which is used now only in the new asserts, but will
> come in handy in a later patch that adds auto-fragmentation to the
> max transfer size, since that value need not always be a multiple
> of the alignment, and therefore must be rounded down.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv() Eric Blake
@ 2016-06-16  5:08   ` Fam Zheng
  2016-06-20 12:10   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:08 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf, Stefan Hajnoczi, Max Reitz

On Tue, 06/14 15:30, Eric Blake wrote:
> We don't pass any flags on to drivers to handle.  Tighten an
> assert to explain why we pass 0 to bdrv_driver_preadv(), and add
> some comments on things to be aware of if we want to turn on
> per-BDS BDRV_REQ_FUA support during reads in the future.  Also,
> document that we may want to consider using unmap during
> copy-on-read operations where the read is all zeroes.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv() Eric Blake
@ 2016-06-16  5:10   ` Fam Zheng
  2016-06-20 12:12   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:10 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf, Stefan Hajnoczi, Max Reitz

On Tue, 06/14 15:30, Eric Blake wrote:
> If the amount of data to read ends exactly on the total size
> of the bs, then we were wasting time creating a local qiov
> to read the data in preparation for what would normally be
> appending zeroes beyond the end, even though this corner case
> has nothing further to do.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests Eric Blake
  2016-06-15 13:37   ` Paolo Bonzini
@ 2016-06-16  5:12   ` Fam Zheng
  2016-06-20 12:13   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2 siblings, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:12 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, Paolo Bonzini, qemu-block, Max Reitz

On Tue, 06/14 15:30, Eric Blake wrote:
> The NBD layer was breaking up request at a limit of 2040 sectors
> (just under 1M) to cater to old qemu-nbd. But the server limit
> was raised to 32M in commit 2d8214885 to match the kernel, more
> than three years ago; and the upstream NBD Protocol is proposing
> documentation that without any explicit communication to state
> otherwise, a client should be able to safely assume that a 32M
> transaction will work.  It is time to rely on the larger sizing,
> and any downstream distro that cares about maximum
> interoperability to older qemu-nbd servers can just tweak the
> value of #define NBD_MAX_SECTORS.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer Eric Blake
  2016-06-15 13:38   ` Paolo Bonzini
@ 2016-06-16  5:13   ` Fam Zheng
  2016-06-20 12:14   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2 siblings, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:13 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, Paolo Bonzini, qemu-block, Max Reitz

On Tue, 06/14 15:30, Eric Blake wrote:
> We were basing the advertisement of maximum discard and transfer
> length off of UINT32_MAX, but since the rest of the block layer
> has signed int limits on a transaction, nothing could ever reach
> that maximum, and we risk overflowing an int once things are
> converted to byte-based rather than sector-based limits.  What's
> more, we DO have a much smaller limit: both the current kernel
> and qemu-nbd have a hard limit of 32M on a read or write
> transaction, and while they may also permit up to a full 32 bits
> on a discard transaction, the upstream NBD protocol is proposing
> wording that without any explicit advertisement otherwise,
> clients should limit ALL requests to the same limits as read and
> write, even though the other requests do not actually require as
> many bytes across the wire.  So the better limit to tell the
> block layer is 32M for both values.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/17] iscsi: Advertise realistic limits to block layer
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 06/17] iscsi: " Eric Blake
@ 2016-06-16  5:19   ` Fam Zheng
  2016-06-20 12:16   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:19 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, Peter Lieven, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini

On Tue, 06/14 15:30, Eric Blake wrote:
> The function sector_limits_lun2qemu() returns a value in units of
> the block layer's 512-byte sector, and can be as large as
> 0x40000000, which is much larger than the block layer's inherent
> limit of BDRV_REQUEST_MAX_SECTORS.  The block layer already
> handles '0' as a synonym to the inherent limit, and it is nicer
> to return this value than it is to calculate an arbitrary
> maximum, for two reasons: we want to ensure that the block layer
> continues to special-case '0' as 'no limit beyond the inherent
> limits'; and we want to be able to someday expand the block
> layer to allow 64-bit limits, where auditing for uses of
> BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
> artificially constraining iscsi to old block layer limits.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/17] block: Give nonzero result to blk_get_max_transfer_length()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 07/17] block: Give nonzero result to blk_get_max_transfer_length() Eric Blake
@ 2016-06-16  5:25   ` Fam Zheng
  2016-06-16 13:22     ` Paolo Bonzini
  0 siblings, 1 reply; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:25 UTC (permalink / raw)
  To: Eric Blake, pbonzini
  Cc: qemu-devel, kwolf, qemu-block, Michael S. Tsirkin, Max Reitz,
	Stefan Hajnoczi

On Tue, 06/14 15:30, Eric Blake wrote:
> Making all callers special-case 0 as unlimited is awkward,
> and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given
> our current block layer API limits.
> 
> In the case of scsi, this means that we now always advertise a
> limit to the guest, even in cases where the underlying layers
> previously use 0 for no inherent limit beyond the block layer.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: new patch
> ---
>  block/block-backend.c  |  7 ++++---
>  hw/block/virtio-blk.c  |  3 +--
>  hw/scsi/scsi-generic.c | 12 ++++++------
>  3 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 34500e6..1fb070b 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1303,15 +1303,16 @@ int blk_get_flags(BlockBackend *blk)
>      }
>  }
> 
> +/* Returns the maximum transfer length, in sectors; guaranteed nonzero */
>  int blk_get_max_transfer_length(BlockBackend *blk)
>  {
>      BlockDriverState *bs = blk_bs(blk);
> +    int max = 0;
> 
>      if (bs) {
> -        return bs->bl.max_transfer_length;
> -    } else {
> -        return 0;
> +        max = bs->bl.max_transfer_length;
>      }
> +    return MIN_NON_ZERO(max, BDRV_REQUEST_MAX_SECTORS);
>  }
> 
>  int blk_get_max_iov(BlockBackend *blk)
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 284e646..1d2792e 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -382,7 +382,7 @@ static int multireq_compare(const void *a, const void *b)
>  void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
>  {
>      int i = 0, start = 0, num_reqs = 0, niov = 0, nb_sectors = 0;
> -    int max_xfer_len = 0;
> +    int max_xfer_len;
>      int64_t sector_num = 0;
> 
>      if (mrb->num_reqs == 1) {
> @@ -392,7 +392,6 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
>      }
> 
>      max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk);
> -    max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS);
> 
>      qsort(mrb->reqs, mrb->num_reqs, sizeof(*mrb->reqs),
>            &multireq_compare);
> diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> index 71372a8..db04a76 100644
> --- a/hw/scsi/scsi-generic.c
> +++ b/hw/scsi/scsi-generic.c
> @@ -226,12 +226,12 @@ static void scsi_read_complete(void * opaque, int ret)
>          r->req.cmd.buf[0] == INQUIRY &&
>          r->req.cmd.buf[2] == 0xb0) {
>          uint32_t max_xfer_len = blk_get_max_transfer_length(s->conf.blk);
> -        if (max_xfer_len) {
> -            stl_be_p(&r->buf[8], max_xfer_len);
> -            /* Also take care of the opt xfer len. */
> -            if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
> -                stl_be_p(&r->buf[12], max_xfer_len);
> -            }
> +
> +        assert(max_xfer_len);
> +        stl_be_p(&r->buf[8], max_xfer_len);
> +        /* Also take care of the opt xfer len. */
> +        if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
> +            stl_be_p(&r->buf[12], max_xfer_len);

Paolo: should we take s->blocksize into account? I missed it when I wrote this
piece of code.

Fam

>          }
>      }
>      scsi_req_data(&r->req, len);
> -- 
> 2.5.5
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits() Eric Blake
@ 2016-06-16  5:27   ` Fam Zheng
  2016-06-21 13:27   ` Kevin Wolf
  1 sibling, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:27 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, qemu-block, Max Reitz

On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> qemu-iotests 77 is particularly sensitive to the fact that we
> can specify an artificial alignment override in blkdebug, and
> that override must continue to work even when limits are
> refreshed on an already open device.
> 
> A later patch will be altering when bs->request_alignment
> defaults are set, so fall back to sector alignment if we have
> not inherited anything yet.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/17] iscsi: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 09/17] iscsi: " Eric Blake
@ 2016-06-16  5:28   ` Fam Zheng
  0 siblings, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:28 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, Peter Lieven, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini

On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] qcow2: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 10/17] qcow2: " Eric Blake
@ 2016-06-16  5:28   ` Fam Zheng
  0 siblings, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:28 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, qemu-block, Max Reitz

On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/17] raw-win32: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 11/17] raw-win32: " Eric Blake
@ 2016-06-16  5:30   ` Fam Zheng
  0 siblings, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:30 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, qemu-block, Max Reitz

On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> In this case, raw_probe_alignment() already did what we needed,
> so just fix its signature and wire it in correctly.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/17] block: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 12/17] block: " Eric Blake
@ 2016-06-16  5:31   ` Fam Zheng
  0 siblings, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:31 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, Stefan Hajnoczi, qemu-block, Max Reitz

On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> Add a .bdrv_refresh_limits() to all four of our legacy devices
> that will always be sector-only (bochs, cloop, dmg, vvfat), in
> spite of their recent conversion to expose a byte interface.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/17] block: Set default request_alignment during bdrv_refresh_limits()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 13/17] block: Set default request_alignment during bdrv_refresh_limits() Eric Blake
@ 2016-06-16  5:32   ` Fam Zheng
  0 siblings, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:32 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf, Max Reitz, Stefan Hajnoczi

On Tue, 06/14 15:30, Eric Blake wrote:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> Now that all drivers have been updated to supply an override
> of request_alignment during their .bdrv_refresh_limits(), as
> needed, the block layer itself can defer setting the default
> alignment until part of the overall bdrv_refresh_limits().
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based Eric Blake
@ 2016-06-16  5:42   ` Fam Zheng
  2016-06-21 13:50   ` Kevin Wolf
  1 sibling, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:42 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, kwolf, Max Reitz, Stefan Hajnoczi,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven, Michael S. Tsirkin

On Tue, 06/14 15:30, Eric Blake wrote:
> Sector-based limits are awkward to think about; in our on-going
> quest to move to byte-based interfaces, convert max_transfer_length
> and opt_transfer_length.  Rename them (dropping the _length suffix)
> so that the compiler will help us catch the change in semantics
> across any rebased code, and improve the documentation.  Use unsigned
> values, so that we don't have to worry about negative values and
> so that bit-twiddling is easier; however, we are still constrained
> by 2^31 of signed int in most APIs.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Looks good apart from the scsi-generic blocksize calculation, which is not an
issue of this patch.

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/17] block: Switch discard length bounds to byte-based
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 15/17] block: Switch discard " Eric Blake
@ 2016-06-16  5:46   ` Fam Zheng
  2016-06-16 14:21     ` Eric Blake
  2016-06-21 14:05   ` Kevin Wolf
  1 sibling, 1 reply; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:46 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, kwolf, Stefan Hajnoczi, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

On Tue, 06/14 15:30, Eric Blake wrote:
> Sector-based limits are awkward to think about; in our on-going
> quest to move to byte-based interfaces, convert max_discard and
> discard_alignment.  Rename them, using 'pdiscard' as an aid to
> track which remaining discard interfaces need conversion, and so
> that the compiler will help us catch the change in semantics
> across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
> is no longer needed; and the BlockLimits type is now completely
> byte-based.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: rebase nbd and iscsi limits across earlier improvements
> ---
>  include/block/block_int.h | 21 +++++++++++++++------
>  block/io.c                | 16 +++++++++-------
>  block/iscsi.c             | 19 ++++++-------------
>  block/nbd.c               |  2 +-
>  qemu-img.c                |  3 ++-
>  5 files changed, 33 insertions(+), 28 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 2b45d57..0169019 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -324,18 +324,27 @@ struct BlockDriver {
>  };
> 
>  typedef struct BlockLimits {
> -    /* maximum number of sectors that can be discarded at once */
> -    int max_discard;
> +    /* maximum number of bytes that can be discarded at once (since it
> +     * is signed, it must be < 2G, if set), should be multiple of
> +     * pdiscard_alignment, but need not be power of 2. May be 0 if no
> +     * inherent 32-bit limit */
> +    int32_t max_pdiscard;

Why not use uint32_t?

> 
> -    /* optimal alignment for discard requests in sectors */
> -    int64_t discard_alignment;
> +    /* optimal alignment for discard requests in bytes, must be power
> +     * of 2, less than max_discard if that is set, and multiple of
> +     * bs->request_alignment. May be 0 if bs->request_alignment is
> +     * good enough */
> +    uint32_t pdiscard_alignment;
> 
>      /* maximum number of bytes that can zeroized at once (since it is
> -     * signed, it must be < 2G, if set) */
> +     * signed, it must be < 2G, if set), should be multiple of
> +     * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
>      int32_t max_pwrite_zeroes;
> 
>      /* optimal alignment for write zeroes requests in bytes, must be
> -     * power of 2, and less than max_pwrite_zeroes if that is set */
> +     * power of 2, less than max_pwrite_zeroes if that is set, and
> +     * multiple of bs->request_alignment. May be 0 if
> +     * bs->request_alignment is good enough */
>      uint32_t pwrite_zeroes_alignment;
> 
>      /* optimal transfer length in bytes (must be power of 2, and
> diff --git a/block/io.c b/block/io.c
> index 5e38ab4..0b5c40d 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2352,19 +2352,21 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
>                            BDRV_TRACKED_DISCARD);
>      bdrv_set_dirty(bs, sector_num, nb_sectors);
> 
> -    max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
> +    max_discard = MIN_NON_ZERO(bs->bl.max_pdiscard >> BDRV_SECTOR_BITS,
> +                               BDRV_REQUEST_MAX_SECTORS);
>      while (nb_sectors > 0) {
>          int ret;
>          int num = nb_sectors;
> +        int discard_alignment = bs->bl.pdiscard_alignment >> BDRV_SECTOR_BITS;
> 
>          /* align request */
> -        if (bs->bl.discard_alignment &&
> -            num >= bs->bl.discard_alignment &&
> -            sector_num % bs->bl.discard_alignment) {
> -            if (num > bs->bl.discard_alignment) {
> -                num = bs->bl.discard_alignment;
> +        if (discard_alignment &&
> +            num >= discard_alignment &&
> +            sector_num % discard_alignment) {
> +            if (num > discard_alignment) {
> +                num = discard_alignment;
>              }
> -            num -= sector_num % bs->bl.discard_alignment;
> +            num -= sector_num % discard_alignment;
>          }
> 
>          /* limit request size */
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 739d5ca..af9babf 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -1696,13 +1696,6 @@ static void iscsi_close(BlockDriverState *bs)
>      memset(iscsilun, 0, sizeof(IscsiLun));
>  }
> 
> -static int sector_limits_lun2qemu(int64_t sector, IscsiLun *iscsilun)
> -{
> -    int limit = MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
> -
> -    return limit < BDRV_REQUEST_MAX_SECTORS ? limit : 0;
> -}
> -
>  static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>      /* We don't actually refresh here, but just return data queried in
> @@ -1722,14 +1715,14 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>      }
> 
>      if (iscsilun->lbp.lbpu) {
> -        if (iscsilun->bl.max_unmap < 0xffffffff) {
> -            bs->bl.max_discard =
> -                sector_limits_lun2qemu(iscsilun->bl.max_unmap, iscsilun);
> +        if (iscsilun->bl.max_unmap < 0xffffffff / iscsilun->block_size) {
> +            bs->bl.max_pdiscard =
> +                iscsilun->bl.max_unmap * iscsilun->block_size;
>          }
> -        bs->bl.discard_alignment =
> -            sector_limits_lun2qemu(iscsilun->bl.opt_unmap_gran, iscsilun);
> +        bs->bl.pdiscard_alignment =
> +            iscsilun->bl.opt_unmap_gran * iscsilun->block_size;
>      } else {
> -        bs->bl.discard_alignment = iscsilun->block_size >> BDRV_SECTOR_BITS;
> +        bs->bl.pdiscard_alignment = iscsilun->block_size;
>      }
> 
>      if (iscsilun->bl.max_ws_len < 0xffffffff / iscsilun->block_size) {
> diff --git a/block/nbd.c b/block/nbd.c
> index f5511ea..08e5b67 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -362,7 +362,7 @@ static int nbd_co_flush(BlockDriverState *bs)
> 
>  static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
> -    bs->bl.max_discard = NBD_MAX_SECTORS;
> +    bs->bl.max_pdiscard = NBD_MAX_BUFFER_SIZE;
>      bs->bl.max_transfer = NBD_MAX_BUFFER_SIZE;
>  }
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index cf9876d..1237d61 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -2084,7 +2084,8 @@ static int img_convert(int argc, char **argv)
>      bufsectors = MIN(32768,
>                       MAX(bufsectors,
>                           MAX(out_bs->bl.opt_transfer >> BDRV_SECTOR_BITS,
> -                             out_bs->bl.discard_alignment)));
> +                             out_bs->bl.pdiscard_alignment >>
> +                             BDRV_SECTOR_BITS)));
> 
>      if (skip_create) {
>          int64_t output_sectors = blk_nb_sectors(out_blk);
> -- 
> 2.5.5
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits() Eric Blake
@ 2016-06-16  5:50   ` Fam Zheng
  2016-06-16 14:22     ` Eric Blake
  2016-06-21 14:12   ` Kevin Wolf
  1 sibling, 1 reply; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:50 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf, Stefan Hajnoczi, Max Reitz

On Tue, 06/14 15:30, Eric Blake wrote:
> The raw block driver was blindly copying all limits from bs->file,
> even though: 1. the main bdrv_refresh_limits() already does this
> for many of gthe limits, and 2. blindly copying from the children

s/gthe/the ?

> can weaken any stricter limits that were already inherited from
> the backing dhain during the main bdrv_refresh_limits().  Also,
> the next patch is about to move .request_alignment into
> BlockLimits, and that is a limit that should NOT be copied from
> other layers in the BDS chain.
> 
> Solve the issue by factoring out a new bdrv_merge_limits(),
> and using that function to properly merge limits when comparing
> two BlockDriverState objects.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit Eric Blake
@ 2016-06-16  5:52   ` Fam Zheng
  2016-06-21 14:16   ` Kevin Wolf
  1 sibling, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-16  5:52 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, kwolf, Max Reitz, Stefan Hajnoczi,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

On Tue, 06/14 15:30, Eric Blake wrote:
> It makes more sense to have ALL block size limit constraints
> in the same struct.  Improve the documentation while at it.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/17] block: Give nonzero result to blk_get_max_transfer_length()
  2016-06-16  5:25   ` Fam Zheng
@ 2016-06-16 13:22     ` Paolo Bonzini
  0 siblings, 0 replies; 63+ messages in thread
From: Paolo Bonzini @ 2016-06-16 13:22 UTC (permalink / raw)
  To: Fam Zheng, Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, Michael S. Tsirkin, Max Reitz,
	Stefan Hajnoczi



On 16/06/2016 07:25, Fam Zheng wrote:
>> > +        assert(max_xfer_len);
>> > +        stl_be_p(&r->buf[8], max_xfer_len);
>> > +        /* Also take care of the opt xfer len. */
>> > +        if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
>> > +            stl_be_p(&r->buf[12], max_xfer_len);
> Paolo: should we take s->blocksize into account? I missed it when I wrote this
> piece of code.

Hmm, the maximum and optimal transfer length is in logical blocks, so yes.

There is one thing that isn't great: at least Linux reads the capacity
before the block limits page, but we cannot know for sure that everyone
does it.  I think it's a good idea to send a READ CAPACITY(10) at
startup, like we do in get_stream_blocksize.  We only need it for the
blocksize, not for the capacity, so it's not necessary to use READ
CAPACITY(16).

The snooping is still necessary for the max_lba (which is used by
scsi_disk_dma_command), so we might as well keep the assignment to
s->blocksize in there too.  But let's add a best effort READ
CAPACITY(10) at startup.

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/17] block: Switch discard length bounds to byte-based
  2016-06-16  5:46   ` Fam Zheng
@ 2016-06-16 14:21     ` Eric Blake
  2016-06-17  2:28       ` Fam Zheng
  0 siblings, 1 reply; 63+ messages in thread
From: Eric Blake @ 2016-06-16 14:21 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, qemu-block, kwolf, Stefan Hajnoczi, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

[-- Attachment #1: Type: text/plain, Size: 2111 bytes --]

On 06/15/2016 11:46 PM, Fam Zheng wrote:
> On Tue, 06/14 15:30, Eric Blake wrote:
>> Sector-based limits are awkward to think about; in our on-going
>> quest to move to byte-based interfaces, convert max_discard and
>> discard_alignment.  Rename them, using 'pdiscard' as an aid to
>> track which remaining discard interfaces need conversion, and so
>> that the compiler will help us catch the change in semantics
>> across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
>> is no longer needed; and the BlockLimits type is now completely
>> byte-based.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>>
>>  typedef struct BlockLimits {
>> -    /* maximum number of sectors that can be discarded at once */
>> -    int max_discard;
>> +    /* maximum number of bytes that can be discarded at once (since it
>> +     * is signed, it must be < 2G, if set), should be multiple of
>> +     * pdiscard_alignment, but need not be power of 2. May be 0 if no
>> +     * inherent 32-bit limit */
>> +    int32_t max_pdiscard;
> 
> Why not use uint32_t?
> 
>>
>> -    /* optimal alignment for discard requests in sectors */
>> -    int64_t discard_alignment;
>> +    /* optimal alignment for discard requests in bytes, must be power
>> +     * of 2, less than max_discard if that is set, and multiple of
>> +     * bs->request_alignment. May be 0 if bs->request_alignment is
>> +     * good enough */
>> +    uint32_t pdiscard_alignment;
>>
>>      /* maximum number of bytes that can zeroized at once (since it is
>> -     * signed, it must be < 2G, if set) */
>> +     * signed, it must be < 2G, if set), should be multiple of
>> +     * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
>>      int32_t max_pwrite_zeroes;

Because max_pwrite_zeroes didn't.

And because we're still limited by INT_MAX (or it's alternative spelling
BDRV_REQUEST_MAX_SECTORS).

Maybe we should switch both to uint32_t, but that can be a followup.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits()
  2016-06-16  5:50   ` Fam Zheng
@ 2016-06-16 14:22     ` Eric Blake
  0 siblings, 0 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-16 14:22 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-devel, qemu-block, kwolf, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1135 bytes --]

On 06/15/2016 11:50 PM, Fam Zheng wrote:
> On Tue, 06/14 15:30, Eric Blake wrote:
>> The raw block driver was blindly copying all limits from bs->file,
>> even though: 1. the main bdrv_refresh_limits() already does this
>> for many of gthe limits, and 2. blindly copying from the children
> 
> s/gthe/the ?

Yep. [Sometimes it's interesting to stick in a typo, just to see who
notices. Other times it's just me fat-fingering things]

> 
>> can weaken any stricter limits that were already inherited from
>> the backing dhain during the main bdrv_refresh_limits().  Also,
>> the next patch is about to move .request_alignment into
>> BlockLimits, and that is a limit that should NOT be copied from
>> other layers in the BDS chain.
>>
>> Solve the issue by factoring out a new bdrv_merge_limits(),
>> and using that function to properly merge limits when comparing
>> two BlockDriverState objects.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> Reviewed-by: Fam Zheng <famz@redhat.com>
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/17] block: Switch discard length bounds to byte-based
  2016-06-16 14:21     ` Eric Blake
@ 2016-06-17  2:28       ` Fam Zheng
  0 siblings, 0 replies; 63+ messages in thread
From: Fam Zheng @ 2016-06-17  2:28 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, kwolf, Stefan Hajnoczi, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

On Thu, 06/16 08:21, Eric Blake wrote:
> On 06/15/2016 11:46 PM, Fam Zheng wrote:
> > On Tue, 06/14 15:30, Eric Blake wrote:
> >> Sector-based limits are awkward to think about; in our on-going
> >> quest to move to byte-based interfaces, convert max_discard and
> >> discard_alignment.  Rename them, using 'pdiscard' as an aid to
> >> track which remaining discard interfaces need conversion, and so
> >> that the compiler will help us catch the change in semantics
> >> across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
> >> is no longer needed; and the BlockLimits type is now completely
> >> byte-based.
> >>
> >> Signed-off-by: Eric Blake <eblake@redhat.com>
> >>
> 
> >>
> >>  typedef struct BlockLimits {
> >> -    /* maximum number of sectors that can be discarded at once */
> >> -    int max_discard;
> >> +    /* maximum number of bytes that can be discarded at once (since it
> >> +     * is signed, it must be < 2G, if set), should be multiple of
> >> +     * pdiscard_alignment, but need not be power of 2. May be 0 if no
> >> +     * inherent 32-bit limit */
> >> +    int32_t max_pdiscard;
> > 
> > Why not use uint32_t?
> > 
> >>
> >> -    /* optimal alignment for discard requests in sectors */
> >> -    int64_t discard_alignment;
> >> +    /* optimal alignment for discard requests in bytes, must be power
> >> +     * of 2, less than max_discard if that is set, and multiple of
> >> +     * bs->request_alignment. May be 0 if bs->request_alignment is
> >> +     * good enough */
> >> +    uint32_t pdiscard_alignment;
> >>
> >>      /* maximum number of bytes that can zeroized at once (since it is
> >> -     * signed, it must be < 2G, if set) */
> >> +     * signed, it must be < 2G, if set), should be multiple of
> >> +     * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
> >>      int32_t max_pwrite_zeroes;
> 
> Because max_pwrite_zeroes didn't.
> 
> And because we're still limited by INT_MAX (or it's alternative spelling
> BDRV_REQUEST_MAX_SECTORS).
> 
> Maybe we should switch both to uint32_t, but that can be a followup.
> 

OK, thanks!

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev() Eric Blake
  2016-06-16  5:05   ` Fam Zheng
@ 2016-06-20 12:09   ` Stefan Hajnoczi
  1 sibling, 0 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2016-06-20 12:09 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, Fam Zheng, Stefan Hajnoczi, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 687 bytes --]

On Tue, Jun 14, 2016 at 03:30:23PM -0600, Eric Blake wrote:
> For symmetry with bdrv_aligned_preadv(), assert that the caller
> really has aligned things properly. This requires adding an align
> parameter, which is used now only in the new asserts, but will
> come in handy in a later patch that adds auto-fragmentation to the
> max transfer size, since that value need not always be a multiple
> of the alignment, and therefore must be rounded down.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: new patch
> ---
>  block/io.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv() Eric Blake
  2016-06-16  5:08   ` Fam Zheng
@ 2016-06-20 12:10   ` Stefan Hajnoczi
  1 sibling, 0 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2016-06-20 12:10 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, Fam Zheng, Stefan Hajnoczi, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 741 bytes --]

On Tue, Jun 14, 2016 at 03:30:24PM -0600, Eric Blake wrote:
> We don't pass any flags on to drivers to handle.  Tighten an
> assert to explain why we pass 0 to bdrv_driver_preadv(), and add
> some comments on things to be aware of if we want to turn on
> per-BDS BDRV_REQ_FUA support during reads in the future.  Also,
> document that we may want to consider using unmap during
> copy-on-read operations where the read is all zeroes.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: retitle from 'Honor flags during bdrv_aligned_preadv()', and
> change scope of patch
> ---
>  block/io.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv() Eric Blake
  2016-06-16  5:10   ` Fam Zheng
@ 2016-06-20 12:12   ` Stefan Hajnoczi
  1 sibling, 0 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2016-06-20 12:12 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, Fam Zheng, Stefan Hajnoczi, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 613 bytes --]

On Tue, Jun 14, 2016 at 03:30:25PM -0600, Eric Blake wrote:
> If the amount of data to read ends exactly on the total size
> of the bs, then we were wasting time creating a local qiov
> to read the data in preparation for what would normally be
> appending zeroes beyond the end, even though this corner case
> has nothing further to do.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> 
> ---
> v2: no change, add R-b
> ---
>  block/io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 04/17] nbd: Allow larger requests
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests Eric Blake
  2016-06-15 13:37   ` Paolo Bonzini
  2016-06-16  5:12   ` Fam Zheng
@ 2016-06-20 12:13   ` Stefan Hajnoczi
  2 siblings, 0 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2016-06-20 12:13 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, Paolo Bonzini, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 943 bytes --]

On Tue, Jun 14, 2016 at 03:30:26PM -0600, Eric Blake wrote:
> The NBD layer was breaking up request at a limit of 2040 sectors
> (just under 1M) to cater to old qemu-nbd. But the server limit
> was raised to 32M in commit 2d8214885 to match the kernel, more
> than three years ago; and the upstream NBD Protocol is proposing
> documentation that without any explicit communication to state
> otherwise, a client should be able to safely assume that a 32M
> transaction will work.  It is time to rely on the larger sizing,
> and any downstream distro that cares about maximum
> interoperability to older qemu-nbd servers can just tweak the
> value of #define NBD_MAX_SECTORS.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: new patch
> ---
>  include/block/nbd.h | 1 +
>  block/nbd-client.c  | 4 ----
>  2 files changed, 1 insertion(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer Eric Blake
  2016-06-15 13:38   ` Paolo Bonzini
  2016-06-16  5:13   ` Fam Zheng
@ 2016-06-20 12:14   ` Stefan Hajnoczi
  2 siblings, 0 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2016-06-20 12:14 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, kwolf, Paolo Bonzini, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]

On Tue, Jun 14, 2016 at 03:30:27PM -0600, Eric Blake wrote:
> We were basing the advertisement of maximum discard and transfer
> length off of UINT32_MAX, but since the rest of the block layer
> has signed int limits on a transaction, nothing could ever reach
> that maximum, and we risk overflowing an int once things are
> converted to byte-based rather than sector-based limits.  What's
> more, we DO have a much smaller limit: both the current kernel
> and qemu-nbd have a hard limit of 32M on a read or write
> transaction, and while they may also permit up to a full 32 bits
> on a discard transaction, the upstream NBD protocol is proposing
> wording that without any explicit advertisement otherwise,
> clients should limit ALL requests to the same limits as read and
> write, even though the other requests do not actually require as
> many bytes across the wire.  So the better limit to tell the
> block layer is 32M for both values.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: new patch
> ---
>  block/nbd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 06/17] iscsi: Advertise realistic limits to block layer
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 06/17] iscsi: " Eric Blake
  2016-06-16  5:19   ` Fam Zheng
@ 2016-06-20 12:16   ` Stefan Hajnoczi
  1 sibling, 0 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2016-06-20 12:16 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, kwolf, qemu-block, Peter Lieven, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]

On Tue, Jun 14, 2016 at 03:30:28PM -0600, Eric Blake wrote:
> The function sector_limits_lun2qemu() returns a value in units of
> the block layer's 512-byte sector, and can be as large as
> 0x40000000, which is much larger than the block layer's inherent
> limit of BDRV_REQUEST_MAX_SECTORS.  The block layer already
> handles '0' as a synonym to the inherent limit, and it is nicer
> to return this value than it is to calculate an arbitrary
> maximum, for two reasons: we want to ensure that the block layer
> continues to special-case '0' as 'no limit beyond the inherent
> limits'; and we want to be able to someday expand the block
> layer to allow 64-bit limits, where auditing for uses of
> BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
> artificially constraining iscsi to old block layer limits.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: new patch
> ---
>  block/iscsi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits() Eric Blake
  2016-06-16  5:27   ` Fam Zheng
@ 2016-06-21 13:27   ` Kevin Wolf
  2016-06-21 22:17     ` Eric Blake
  1 sibling, 1 reply; 63+ messages in thread
From: Kevin Wolf @ 2016-06-21 13:27 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Max Reitz

Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
> We want to eventually stick request_alignment alongside other
> BlockLimits, but first, we must ensure it is populated at the
> same time as all other limits, rather than being a special case
> that is set only when a block is first opened.
> 
> qemu-iotests 77 is particularly sensitive to the fact that we
> can specify an artificial alignment override in blkdebug, and
> that override must continue to work even when limits are
> refreshed on an already open device.
> 
> A later patch will be altering when bs->request_alignment
> defaults are set, so fall back to sector alignment if we have
> not inherited anything yet.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: new patch
> ---
>  block/blkdebug.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 20d25bd..1589fa7 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -37,6 +37,7 @@
>  typedef struct BDRVBlkdebugState {
>      int state;
>      int new_state;
> +    int align;
> 
>      QLIST_HEAD(, BlkdebugRule) rules[BLKDBG__MAX];
>      QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
> @@ -382,9 +383,10 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
>      }
> 
>      /* Set request alignment */
> -    align = qemu_opt_get_size(opts, "align", bs->request_alignment);
> -    if (align > 0 && align < INT_MAX && !(align & (align - 1))) {
> -        bs->request_alignment = align;
> +    align = qemu_opt_get_size(opts, "align",
> +                              bs->request_alignment ?: BDRV_SECTOR_SIZE);

In the state as of this patch: How would bs->request_alignment ever be
zero? It is always initialised as 512 (because blkdebug doesn't
implement byte-based interfaces).

Later in this series, you move the initialisation of the field to
bdrv_refresh_limits(). However, the check still doesn't make sense
because now it's always 0 and you always use the BDRV_SECTOR_SIZE
fallback.

I think you should either just unconditionally use BDRV_SECTOR_SIZE or
even better just store 0 in s->align if the option isn't given. Your
implementation of blkdebug_refresh_limits() already leaves the default
bs->request_alignment alone if s->align == 0.

> +    if (align > 0 && align < INT_MAX && is_power_of_2(align)) {
> +        s->align = align;
>      } else {
>          error_setg(errp, "Invalid alignment");
>          ret = -EINVAL;

Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based Eric Blake
  2016-06-16  5:42   ` Fam Zheng
@ 2016-06-21 13:50   ` Kevin Wolf
  2016-06-21 22:20     ` Eric Blake
  1 sibling, 1 reply; 63+ messages in thread
From: Kevin Wolf @ 2016-06-21 13:50 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, Max Reitz, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven, Michael S. Tsirkin

Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
> Sector-based limits are awkward to think about; in our on-going
> quest to move to byte-based interfaces, convert max_transfer_length
> and opt_transfer_length.  Rename them (dropping the _length suffix)
> so that the compiler will help us catch the change in semantics
> across any rebased code, and improve the documentation.  Use unsigned
> values, so that we don't have to worry about negative values and
> so that bit-twiddling is easier; however, we are still constrained
> by 2^31 of signed int in most APIs.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

> @@ -1738,8 +1742,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>      } else {
>          bs->bl.pwrite_zeroes_alignment = iscsilun->block_size;
>      }
> -    bs->bl.opt_transfer_length =
> -        sector_limits_lun2qemu(iscsilun->bl.opt_xfer_len, iscsilun);
> +    assert(iscsilun->bl.opt_xfer_len < INT_MAX / iscsilun->block_size);
> +    bs->bl.opt_transfer = iscsilun->bl.opt_xfer_len * iscsilun->block_size;
>  }

iscsilun->bl.opt_xfer_len comes directly from libiscsi, and presumably
from the iscsi server, without being checked or sanitised. I don't think
we can assert a specific range of values for it but must assume that it
can be any uint32_t.

We can return an error for a device with a value that we don't like
(even though using the maximum might be just fine), but crashing qemu is
not an option.

> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index aacf132..f2bea85 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -752,7 +752,8 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>          if (S_ISBLK(st.st_mode)) {
>              int ret = hdev_get_max_transfer_length(s->fd);
>              if (ret >= 0) {
> -                bs->bl.max_transfer_length = ret;
> +                assert(ret <= BDRV_REQUEST_MAX_SECTORS);
> +                bs->bl.max_transfer = ret << BDRV_SECTOR_BITS;
>              }
>          }
>      }

Same thing here.

Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/17] block: Switch discard length bounds to byte-based
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 15/17] block: Switch discard " Eric Blake
  2016-06-16  5:46   ` Fam Zheng
@ 2016-06-21 14:05   ` Kevin Wolf
  2016-06-21 22:21     ` Eric Blake
  1 sibling, 1 reply; 63+ messages in thread
From: Kevin Wolf @ 2016-06-21 14:05 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Fam Zheng, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
> Sector-based limits are awkward to think about; in our on-going
> quest to move to byte-based interfaces, convert max_discard and
> discard_alignment.  Rename them, using 'pdiscard' as an aid to
> track which remaining discard interfaces need conversion, and so
> that the compiler will help us catch the change in semantics
> across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
> is no longer needed; and the BlockLimits type is now completely
> byte-based.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

>      /* maximum number of bytes that can zeroized at once (since it is
> -     * signed, it must be < 2G, if set) */
> +     * signed, it must be < 2G, if set), should be multiple of
> +     * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
>      int32_t max_pwrite_zeroes;
> 
>      /* optimal alignment for write zeroes requests in bytes, must be
> -     * power of 2, and less than max_pwrite_zeroes if that is set */
> +     * power of 2, less than max_pwrite_zeroes if that is set, and
> +     * multiple of bs->request_alignment. May be 0 if
> +     * bs->request_alignment is good enough */
>      uint32_t pwrite_zeroes_alignment;

I think you intended to have these as part of some earlier patch, they
are not related to discard.

Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits()
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits() Eric Blake
  2016-06-16  5:50   ` Fam Zheng
@ 2016-06-21 14:12   ` Kevin Wolf
  2016-06-21 22:24     ` Eric Blake
  1 sibling, 1 reply; 63+ messages in thread
From: Kevin Wolf @ 2016-06-21 14:12 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Fam Zheng, Max Reitz

Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
> The raw block driver was blindly copying all limits from bs->file,
> even though: 1. the main bdrv_refresh_limits() already does this
> for many of gthe limits, and 2. blindly copying from the children
> can weaken any stricter limits that were already inherited from
> the backing dhain during the main bdrv_refresh_limits().  Also,
> the next patch is about to move .request_alignment into
> BlockLimits, and that is a limit that should NOT be copied from
> other layers in the BDS chain.
> 
> Solve the issue by factoring out a new bdrv_merge_limits(),
> and using that function to properly merge limits when comparing
> two BlockDriverState objects.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: new patch
> ---
>  include/block/block.h     |  1 +
>  include/block/block_int.h |  4 ++--
>  include/qemu/typedefs.h   |  1 +
>  block/io.c                | 31 +++++++++++++------------------
>  block/raw_bsd.c           |  4 ++--
>  5 files changed, 19 insertions(+), 22 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 733a8ec..c1d4648 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -262,6 +262,7 @@ int64_t bdrv_nb_sectors(BlockDriverState *bs);
>  int64_t bdrv_getlength(BlockDriverState *bs);
>  int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
>  void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
> +void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src);
>  void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
>  int bdrv_commit(BlockDriverState *bs);
>  int bdrv_change_backing_file(BlockDriverState *bs,

Why does this need to be an external block layer interface? (block.h
instead of block_int.h)

Or in fact...

> diff --git a/block/raw_bsd.c b/block/raw_bsd.c
> index b1d5237..379ce8d 100644
> --- a/block/raw_bsd.c
> +++ b/block/raw_bsd.c
> @@ -152,7 +152,7 @@ static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
> 
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
> -    bs->bl = bs->file->bs->bl;
> +    bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
>  }

...isn't this completely redundant because bdrv_refresh_limits() already
called bdrv_merge_limits(&bs->bl, &bs->file->bs->bl)? We could simply
remove the .bdrv_refresh_limits implementation from raw_bsd.

And then bdrv_merge_limits() could even be static (I think factoring it
out is a good idea anyway because it removes some duplication).

Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit Eric Blake
  2016-06-16  5:52   ` Fam Zheng
@ 2016-06-21 14:16   ` Kevin Wolf
  2016-06-21 22:26     ` Eric Blake
  1 sibling, 1 reply; 63+ messages in thread
From: Kevin Wolf @ 2016-06-21 14:16 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, Max Reitz, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
> It makes more sense to have ALL block size limit constraints
> in the same struct.  Improve the documentation while at it.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: drop hacks for save/restore of alignment, now that earlier
> patches handled setting up BlockLimits more sanely
> ---
>  include/block/block_int.h | 22 +++++++++++++---------
>  block.c                   |  2 +-
>  block/blkdebug.c          |  4 ++--
>  block/bochs.c             |  2 +-
>  block/cloop.c             |  2 +-
>  block/dmg.c               |  2 +-
>  block/io.c                | 12 ++++++------
>  block/iscsi.c             |  2 +-
>  block/qcow2.c             |  2 +-
>  block/raw-posix.c         | 16 ++++++++--------
>  block/raw-win32.c         |  6 +++---
>  block/vvfat.c             |  2 +-
>  12 files changed, 39 insertions(+), 35 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 88ef826..77f47d9 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -324,6 +324,12 @@ struct BlockDriver {
>  };
> 
>  struct BlockLimits {
> +    /* Alignment requirement, in bytes, for offset/length of I/O
> +     * requests. Must be a power of 2 less than INT_MAX. A value of 0
> +     * defaults to 1 for drivers with modern byte interfaces, and to
> +     * 512 otherwise. */

No, a value of zero probably crashes qemu. The defaults apply when they
aren't overridden by the driver, but this field is always non-zero.

> +    uint32_t request_alignment;
> +
>      /* maximum number of bytes that can be discarded at once (since it
>       * is signed, it must be < 2G, if set), should be multiple of
>       * pdiscard_alignment, but need not be power of 2. May be 0 if no

Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Byte-based block limits
  2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
                   ` (16 preceding siblings ...)
  2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit Eric Blake
@ 2016-06-21 14:18 ` Kevin Wolf
  2016-06-21 22:27   ` Eric Blake
  17 siblings, 1 reply; 63+ messages in thread
From: Kevin Wolf @ 2016-06-21 14:18 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block

Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
> BlockLimits is currently an ugly mix of byte limits vs.
> sector limits.  Unify it.  Fix some bugs I found in
> bdrv_aligned_preadv() while at it.
> 
> Prequisite: Kevin's ongoing work to migrate bdrv_aligned_preadv()
> to be byte-based (commit 3de06b2 on his vmstate branch at the
> time of this email, but that gets rebased):
> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02832.html
> 
> Trivial contextual conflict in nbd.h with the pull request Paolo
> will soon be posting (both series add a #define near the same
> line; resolution is to add both):
> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03333.html
> 
> Also available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-limits-v2

I had a few minor comments, but this is really close.

Patches 1-7 and 9-13 are:
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits()
  2016-06-21 13:27   ` Kevin Wolf
@ 2016-06-21 22:17     ` Eric Blake
  2016-06-22 10:05       ` Kevin Wolf
  0 siblings, 1 reply; 63+ messages in thread
From: Eric Blake @ 2016-06-21 22:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2430 bytes --]

On 06/21/2016 07:27 AM, Kevin Wolf wrote:
> Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
>> We want to eventually stick request_alignment alongside other
>> BlockLimits, but first, we must ensure it is populated at the
>> same time as all other limits, rather than being a special case
>> that is set only when a block is first opened.
>>
>> qemu-iotests 77 is particularly sensitive to the fact that we
>> can specify an artificial alignment override in blkdebug, and
>> that override must continue to work even when limits are
>> refreshed on an already open device.
>>
>> A later patch will be altering when bs->request_alignment
>> defaults are set, so fall back to sector alignment if we have
>> not inherited anything yet.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>

>>      /* Set request alignment */
>> -    align = qemu_opt_get_size(opts, "align", bs->request_alignment);
>> -    if (align > 0 && align < INT_MAX && !(align & (align - 1))) {
>> -        bs->request_alignment = align;
>> +    align = qemu_opt_get_size(opts, "align",
>> +                              bs->request_alignment ?: BDRV_SECTOR_SIZE);
> 
> In the state as of this patch: How would bs->request_alignment ever be
> zero? It is always initialised as 512 (because blkdebug doesn't
> implement byte-based interfaces).

Correct.

> 
> Later in this series, you move the initialisation of the field to
> bdrv_refresh_limits(). However, the check still doesn't make sense
> because now it's always 0 and you always use the BDRV_SECTOR_SIZE
> fallback.
> 

Correct again.

> I think you should either just unconditionally use BDRV_SECTOR_SIZE or
> even better just store 0 in s->align if the option isn't given. Your
> implementation of blkdebug_refresh_limits() already leaves the default
> bs->request_alignment alone if s->align == 0.

I like that idea; I guess that means I instead need to tweak the logic
to test if "align" is present in opts (in which case it must be
non-zero), or absent (in which case leaving things as 0 is a nicer
approach than trying to guess which default to stick things to).

> 
>> +    if (align > 0 && align < INT_MAX && is_power_of_2(align)) {

And while I'm at it, align > 0 is redundant with is_power_of_2(align);
will simplify on v3.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based
  2016-06-21 13:50   ` Kevin Wolf
@ 2016-06-21 22:20     ` Eric Blake
  0 siblings, 0 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-21 22:20 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, Max Reitz, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 1977 bytes --]

On 06/21/2016 07:50 AM, Kevin Wolf wrote:
> Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
>> Sector-based limits are awkward to think about; in our on-going
>> quest to move to byte-based interfaces, convert max_transfer_length
>> and opt_transfer_length.  Rename them (dropping the _length suffix)
>> so that the compiler will help us catch the change in semantics
>> across any rebased code, and improve the documentation.  Use unsigned
>> values, so that we don't have to worry about negative values and
>> so that bit-twiddling is easier; however, we are still constrained
>> by 2^31 of signed int in most APIs.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
>> @@ -1738,8 +1742,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>>      } else {
>>          bs->bl.pwrite_zeroes_alignment = iscsilun->block_size;
>>      }
>> -    bs->bl.opt_transfer_length =
>> -        sector_limits_lun2qemu(iscsilun->bl.opt_xfer_len, iscsilun);
>> +    assert(iscsilun->bl.opt_xfer_len < INT_MAX / iscsilun->block_size);
>> +    bs->bl.opt_transfer = iscsilun->bl.opt_xfer_len * iscsilun->block_size;
>>  }
> 
> iscsilun->bl.opt_xfer_len comes directly from libiscsi, and presumably
> from the iscsi server, without being checked or sanitised. I don't think
> we can assert a specific range of values for it but must assume that it
> can be any uint32_t.
> 
> We can return an error for a device with a value that we don't like
> (even though using the maximum might be just fine), but crashing qemu is
> not an option.

I guess there's two possible problems: if the value is not a power of 2,
it affects how we want to use it (we probably ought to raise an error
there); and if it is oversized, we can just silently ignore the limit
(since we can't hit it).  I'll see what I can come up with for v3.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/17] block: Switch discard length bounds to byte-based
  2016-06-21 14:05   ` Kevin Wolf
@ 2016-06-21 22:21     ` Eric Blake
  0 siblings, 0 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-21 22:21 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Fam Zheng, Max Reitz,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]

On 06/21/2016 08:05 AM, Kevin Wolf wrote:
> Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
>> Sector-based limits are awkward to think about; in our on-going
>> quest to move to byte-based interfaces, convert max_discard and
>> discard_alignment.  Rename them, using 'pdiscard' as an aid to
>> track which remaining discard interfaces need conversion, and so
>> that the compiler will help us catch the change in semantics
>> across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
>> is no longer needed; and the BlockLimits type is now completely
>> byte-based.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
>>      /* maximum number of bytes that can zeroized at once (since it is
>> -     * signed, it must be < 2G, if set) */
>> +     * signed, it must be < 2G, if set), should be multiple of
>> +     * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
>>      int32_t max_pwrite_zeroes;
>>
>>      /* optimal alignment for write zeroes requests in bytes, must be
>> -     * power of 2, and less than max_pwrite_zeroes if that is set */
>> +     * power of 2, less than max_pwrite_zeroes if that is set, and
>> +     * multiple of bs->request_alignment. May be 0 if
>> +     * bs->request_alignment is good enough */
>>      uint32_t pwrite_zeroes_alignment;
> 
> I think you intended to have these as part of some earlier patch, they
> are not related to discard.

Well, the patch they should have been in is already part of master
(cf081fca), so I'll just split out the changes to a trivial prereq patch.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits()
  2016-06-21 14:12   ` Kevin Wolf
@ 2016-06-21 22:24     ` Eric Blake
  0 siblings, 0 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-21 22:24 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Fam Zheng, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2375 bytes --]

On 06/21/2016 08:12 AM, Kevin Wolf wrote:
> Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
>> The raw block driver was blindly copying all limits from bs->file,
>> even though: 1. the main bdrv_refresh_limits() already does this
>> for many of gthe limits, and 2. blindly copying from the children
>> can weaken any stricter limits that were already inherited from
>> the backing dhain during the main bdrv_refresh_limits().  Also,
>> the next patch is about to move .request_alignment into
>> BlockLimits, and that is a limit that should NOT be copied from
>> other layers in the BDS chain.
>>
>> Solve the issue by factoring out a new bdrv_merge_limits(),
>> and using that function to properly merge limits when comparing
>> two BlockDriverState objects.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

> 
> Or in fact...
> 
>> diff --git a/block/raw_bsd.c b/block/raw_bsd.c
>> index b1d5237..379ce8d 100644
>> --- a/block/raw_bsd.c
>> +++ b/block/raw_bsd.c
>> @@ -152,7 +152,7 @@ static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>>
>>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>>  {
>> -    bs->bl = bs->file->bs->bl;
>> +    bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
>>  }

The blind copy was not completely redundant, but the argument that we
don't want a blind copy but instead want to rely on a merge_limits does
indeed mean that...

> 
> ...isn't this completely redundant because bdrv_refresh_limits() already
> called bdrv_merge_limits(&bs->bl, &bs->file->bs->bl)? We could simply
> remove the .bdrv_refresh_limits implementation from raw_bsd.

...this sounds like a better approach.  I'll try it for v3.

> 
> And then bdrv_merge_limits() could even be static (I think factoring it
> out is a good idea anyway because it removes some duplication).

Yes, even if static, it still gets called twice, and makes for a nicer
way to quickly see which direction limits are constrained in
(MIN_NON_ZERO vs. MAX), as well as to make any further tweaks in later
patches easier to do from one spot (for example, we may want to add
logic that clamps an opt limit [which uses MAX logic] to never exceed a
max limit [which uses MIN_NON_ZERO logic]).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit
  2016-06-21 14:16   ` Kevin Wolf
@ 2016-06-21 22:26     ` Eric Blake
  2016-06-22 10:14       ` Kevin Wolf
  0 siblings, 1 reply; 63+ messages in thread
From: Eric Blake @ 2016-06-21 22:26 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, Max Reitz, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

[-- Attachment #1: Type: text/plain, Size: 1451 bytes --]

On 06/21/2016 08:16 AM, Kevin Wolf wrote:
> Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
>> It makes more sense to have ALL block size limit constraints
>> in the same struct.  Improve the documentation while at it.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>> ---

>>  struct BlockLimits {
>> +    /* Alignment requirement, in bytes, for offset/length of I/O
>> +     * requests. Must be a power of 2 less than INT_MAX. A value of 0
>> +     * defaults to 1 for drivers with modern byte interfaces, and to
>> +     * 512 otherwise. */
> 
> No, a value of zero probably crashes qemu. The defaults apply when they
> aren't overridden by the driver, but this field is always non-zero.
> 

Then why does block.c have:

--- a/block.c
+++ b/block.c
@@ -1016,7 +1016,7 @@ static int bdrv_open_common(BlockDriverState *bs,
BdrvChild *file,

     assert(bdrv_opt_mem_align(bs) != 0);
     assert(bdrv_min_mem_align(bs) != 0);
-    assert(is_power_of_2(bs->request_alignment) || bdrv_is_sg(bs));
+    assert(is_power_of_2(bs->bl.request_alignment) || bdrv_is_sg(bs));

That says that if bdrv_is_sg(bs), we can indeed have request_alignment
be 0.  Should I track that down and fix it first, so that
request_alignment is always nonzero?  If so, is using '1' for
bdrv_is_sg(bs) the ideal solution?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Byte-based block limits
  2016-06-21 14:18 ` [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Kevin Wolf
@ 2016-06-21 22:27   ` Eric Blake
  0 siblings, 0 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-21 22:27 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]

On 06/21/2016 08:18 AM, Kevin Wolf wrote:
> Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
>> BlockLimits is currently an ugly mix of byte limits vs.
>> sector limits.  Unify it.  Fix some bugs I found in
>> bdrv_aligned_preadv() while at it.
>>
>> Prequisite: Kevin's ongoing work to migrate bdrv_aligned_preadv()
>> to be byte-based (commit 3de06b2 on his vmstate branch at the
>> time of this email, but that gets rebased):
>> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02832.html
>>
>> Trivial contextual conflict in nbd.h with the pull request Paolo
>> will soon be posting (both series add a #define near the same
>> line; resolution is to add both):
>> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03333.html
>>
>> Also available as a tag at:
>> git fetch git://repo.or.cz/qemu/ericb.git nbd-limits-v2
> 
> I had a few minor comments, but this is really close.
> 
> Patches 1-7 and 9-13 are:
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Thanks; will respin shortly.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits()
  2016-06-21 22:17     ` Eric Blake
@ 2016-06-22 10:05       ` Kevin Wolf
  2016-06-22 17:33         ` Eric Blake
  0 siblings, 1 reply; 63+ messages in thread
From: Kevin Wolf @ 2016-06-22 10:05 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2496 bytes --]

Am 22.06.2016 um 00:17 hat Eric Blake geschrieben:
> On 06/21/2016 07:27 AM, Kevin Wolf wrote:
> > Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
> >> We want to eventually stick request_alignment alongside other
> >> BlockLimits, but first, we must ensure it is populated at the
> >> same time as all other limits, rather than being a special case
> >> that is set only when a block is first opened.
> >>
> >> qemu-iotests 77 is particularly sensitive to the fact that we
> >> can specify an artificial alignment override in blkdebug, and
> >> that override must continue to work even when limits are
> >> refreshed on an already open device.
> >>
> >> A later patch will be altering when bs->request_alignment
> >> defaults are set, so fall back to sector alignment if we have
> >> not inherited anything yet.
> >>
> >> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> >>      /* Set request alignment */
> >> -    align = qemu_opt_get_size(opts, "align", bs->request_alignment);
> >> -    if (align > 0 && align < INT_MAX && !(align & (align - 1))) {
> >> -        bs->request_alignment = align;
> >> +    align = qemu_opt_get_size(opts, "align",
> >> +                              bs->request_alignment ?: BDRV_SECTOR_SIZE);
> > 
> > In the state as of this patch: How would bs->request_alignment ever be
> > zero? It is always initialised as 512 (because blkdebug doesn't
> > implement byte-based interfaces).
> 
> Correct.
> 
> > 
> > Later in this series, you move the initialisation of the field to
> > bdrv_refresh_limits(). However, the check still doesn't make sense
> > because now it's always 0 and you always use the BDRV_SECTOR_SIZE
> > fallback.
> > 
> 
> Correct again.
> 
> > I think you should either just unconditionally use BDRV_SECTOR_SIZE or
> > even better just store 0 in s->align if the option isn't given. Your
> > implementation of blkdebug_refresh_limits() already leaves the default
> > bs->request_alignment alone if s->align == 0.
> 
> I like that idea; I guess that means I instead need to tweak the logic
> to test if "align" is present in opts (in which case it must be
> non-zero), or absent (in which case leaving things as 0 is a nicer
> approach than trying to guess which default to stick things to).

Except that you don't have to check all of that explicitly, the default
value for absent options is what the third parameter is for:

    align = qemu_opt_get_size(opts, "align", 0);

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit
  2016-06-21 22:26     ` Eric Blake
@ 2016-06-22 10:14       ` Kevin Wolf
  0 siblings, 0 replies; 63+ messages in thread
From: Kevin Wolf @ 2016-06-22 10:14 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, Max Reitz, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, Paolo Bonzini, Peter Lieven

[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]

Am 22.06.2016 um 00:26 hat Eric Blake geschrieben:
> On 06/21/2016 08:16 AM, Kevin Wolf wrote:
> > Am 14.06.2016 um 23:30 hat Eric Blake geschrieben:
> >> It makes more sense to have ALL block size limit constraints
> >> in the same struct.  Improve the documentation while at it.
> >>
> >> Signed-off-by: Eric Blake <eblake@redhat.com>
> >>
> >> ---
> 
> >>  struct BlockLimits {
> >> +    /* Alignment requirement, in bytes, for offset/length of I/O
> >> +     * requests. Must be a power of 2 less than INT_MAX. A value of 0
> >> +     * defaults to 1 for drivers with modern byte interfaces, and to
> >> +     * 512 otherwise. */
> > 
> > No, a value of zero probably crashes qemu. The defaults apply when they
> > aren't overridden by the driver, but this field is always non-zero.
> > 
> 
> Then why does block.c have:
> 
> --- a/block.c
> +++ b/block.c
> @@ -1016,7 +1016,7 @@ static int bdrv_open_common(BlockDriverState *bs,
> BdrvChild *file,
> 
>      assert(bdrv_opt_mem_align(bs) != 0);
>      assert(bdrv_min_mem_align(bs) != 0);
> -    assert(is_power_of_2(bs->request_alignment) || bdrv_is_sg(bs));
> +    assert(is_power_of_2(bs->bl.request_alignment) || bdrv_is_sg(bs));
> 
> That says that if bdrv_is_sg(bs), we can indeed have request_alignment
> be 0.  Should I track that down and fix it first, so that
> request_alignment is always nonzero?  If so, is using '1' for
> bdrv_is_sg(bs) the ideal solution?

Hm, yes, forgot about this. bdrv_is_sg(bs) means that we never use the
I/O functions, so there is no point in specifying any alignment. If we
want to say something about 0 in the comment maybe mention bs->sg
explicitly and that you can't use read/write functions then.

But in fact, I think even sg devices already get a non-zero alignment,
so the second part of the assertion might be redundant. Didn't test it,
though, just had a quick look at the raw-posix code.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits()
  2016-06-22 10:05       ` Kevin Wolf
@ 2016-06-22 17:33         ` Eric Blake
  0 siblings, 0 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-22 17:33 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]

On 06/22/2016 04:05 AM, Kevin Wolf wrote:

>>>>      /* Set request alignment */
>>>> -    align = qemu_opt_get_size(opts, "align", bs->request_alignment);
>>>> -    if (align > 0 && align < INT_MAX && !(align & (align - 1))) {
>>>> -        bs->request_alignment = align;
>>>> +    align = qemu_opt_get_size(opts, "align",
>>>> +                              bs->request_alignment ?: BDRV_SECTOR_SIZE);
>>>

>>> I think you should either just unconditionally use BDRV_SECTOR_SIZE or
>>> even better just store 0 in s->align if the option isn't given. Your
>>> implementation of blkdebug_refresh_limits() already leaves the default
>>> bs->request_alignment alone if s->align == 0.
>>
>> I like that idea; I guess that means I instead need to tweak the logic
>> to test if "align" is present in opts (in which case it must be
>> non-zero), or absent (in which case leaving things as 0 is a nicer
>> approach than trying to guess which default to stick things to).
> 
> Except that you don't have to check all of that explicitly, the default
> value for absent options is what the third parameter is for:
> 
>     align = qemu_opt_get_size(opts, "align", 0);

Previously, we explicitly reject the user passing in an explicit
'align':0.  If I go with this approach of 0 as the default, then I may
need to tweak docs.  But that sounds better, so I'll try it.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer
  2016-06-15 13:38   ` Paolo Bonzini
@ 2016-06-22 20:58     ` Eric Blake
  0 siblings, 0 replies; 63+ messages in thread
From: Eric Blake @ 2016-06-22 20:58 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block, kwolf, qemu-stable, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1994 bytes --]

On 06/15/2016 07:38 AM, Paolo Bonzini wrote:
> 
> 
> On 14/06/2016 23:30, Eric Blake wrote:
>> We were basing the advertisement of maximum discard and transfer
>> length off of UINT32_MAX, but since the rest of the block layer
>> has signed int limits on a transaction, nothing could ever reach
>> that maximum, and we risk overflowing an int once things are
>> converted to byte-based rather than sector-based limits.  What's
>> more, we DO have a much smaller limit: both the current kernel
>> and qemu-nbd have a hard limit of 32M on a read or write
>> transaction, and while they may also permit up to a full 32 bits
>> on a discard transaction, the upstream NBD protocol is proposing
>> wording that without any explicit advertisement otherwise,
>> clients should limit ALL requests to the same limits as read and
>> write, even though the other requests do not actually require as
>> many bytes across the wire.  So the better limit to tell the
>> block layer is 32M for both values.
>>

>>  static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>>  {
>> -    bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
>> -    bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
>> +    bs->bl.max_discard = NBD_MAX_SECTORS;
>> +    bs->bl.max_transfer_length = NBD_MAX_SECTORS;
>>  }
>>
>>  static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
>>
> 
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> Cc: qemu-stable@nongnu.org

This one won't apply on stable without the previous one; for that
matter, it is the previous one that actually changes behavior (until
later patches land, the block layer is ignoring what NBD advertises to
the block layer), while this one has no discernible effect except
avoiding latent bugs on future patches.  So in my v3 series, I'm only
putting CC: stable on 4/17, not 5.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2016-06-22 20:58 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-14 21:30 [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Eric Blake
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev() Eric Blake
2016-06-16  5:05   ` Fam Zheng
2016-06-20 12:09   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv() Eric Blake
2016-06-16  5:08   ` Fam Zheng
2016-06-20 12:10   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv() Eric Blake
2016-06-16  5:10   ` Fam Zheng
2016-06-20 12:12   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests Eric Blake
2016-06-15 13:37   ` Paolo Bonzini
2016-06-16  5:12   ` Fam Zheng
2016-06-20 12:13   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer Eric Blake
2016-06-15 13:38   ` Paolo Bonzini
2016-06-22 20:58     ` Eric Blake
2016-06-16  5:13   ` Fam Zheng
2016-06-20 12:14   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 06/17] iscsi: " Eric Blake
2016-06-16  5:19   ` Fam Zheng
2016-06-20 12:16   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 07/17] block: Give nonzero result to blk_get_max_transfer_length() Eric Blake
2016-06-16  5:25   ` Fam Zheng
2016-06-16 13:22     ` Paolo Bonzini
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits() Eric Blake
2016-06-16  5:27   ` Fam Zheng
2016-06-21 13:27   ` Kevin Wolf
2016-06-21 22:17     ` Eric Blake
2016-06-22 10:05       ` Kevin Wolf
2016-06-22 17:33         ` Eric Blake
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 09/17] iscsi: " Eric Blake
2016-06-16  5:28   ` Fam Zheng
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 10/17] qcow2: " Eric Blake
2016-06-16  5:28   ` Fam Zheng
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 11/17] raw-win32: " Eric Blake
2016-06-16  5:30   ` Fam Zheng
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 12/17] block: " Eric Blake
2016-06-16  5:31   ` Fam Zheng
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 13/17] block: Set default request_alignment during bdrv_refresh_limits() Eric Blake
2016-06-16  5:32   ` Fam Zheng
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based Eric Blake
2016-06-16  5:42   ` Fam Zheng
2016-06-21 13:50   ` Kevin Wolf
2016-06-21 22:20     ` Eric Blake
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 15/17] block: Switch discard " Eric Blake
2016-06-16  5:46   ` Fam Zheng
2016-06-16 14:21     ` Eric Blake
2016-06-17  2:28       ` Fam Zheng
2016-06-21 14:05   ` Kevin Wolf
2016-06-21 22:21     ` Eric Blake
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits() Eric Blake
2016-06-16  5:50   ` Fam Zheng
2016-06-16 14:22     ` Eric Blake
2016-06-21 14:12   ` Kevin Wolf
2016-06-21 22:24     ` Eric Blake
2016-06-14 21:30 ` [Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit Eric Blake
2016-06-16  5:52   ` Fam Zheng
2016-06-21 14:16   ` Kevin Wolf
2016-06-21 22:26     ` Eric Blake
2016-06-22 10:14       ` Kevin Wolf
2016-06-21 14:18 ` [Qemu-devel] [PATCH v2 00/17] Byte-based block limits Kevin Wolf
2016-06-21 22:27   ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.