All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/7] add blkdebug tests
@ 2017-02-14 19:25 Eric Blake
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 1/7] qcow2: Assert that cluster operations are aligned Eric Blake
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Eric Blake @ 2017-02-14 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-blkdebug-v5

v4 was:
https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg02834.html

Since then:
- Rebase to master
- Address comments from Max

001/7:[----] [--] 'qcow2: Assert that cluster operations are aligned'
002/7:[0035] [FC] 'qcow2: Discard/zero clusters by byte count'
003/7:[----] [--] 'blkdebug: Sanity check block layer guarantees'
004/7:[0002] [FC] 'blkdebug: Add pass-through write_zero and discard support'
005/7:[----] [--] 'blkdebug: Simplify override logic'
006/7:[0005] [FC] 'blkdebug: Add ability to override unmap geometries'
007/7:[0006] [FC] 'tests: Add coverage for recent block geometry fixes'

Eric Blake (7):
  qcow2: Assert that cluster operations are aligned
  qcow2: Discard/zero clusters by byte count
  blkdebug: Sanity check block layer guarantees
  blkdebug: Add pass-through write_zero and discard support
  blkdebug: Simplify override logic
  blkdebug: Add ability to override unmap geometries
  tests: Add coverage for recent block geometry fixes

 qapi/block-core.json       |  25 +++++-
 block/qcow2.h              |   9 +-
 block/blkdebug.c           | 204 +++++++++++++++++++++++++++++++++++++++++++--
 block/qcow2-cluster.c      |  50 ++++++-----
 block/qcow2-snapshot.c     |   7 +-
 block/qcow2.c              |  21 ++---
 tests/qemu-iotests/175     | 114 +++++++++++++++++++++++++
 tests/qemu-iotests/175.out |  49 +++++++++++
 tests/qemu-iotests/group   |   1 +
 9 files changed, 427 insertions(+), 53 deletions(-)
 create mode 100755 tests/qemu-iotests/175
 create mode 100644 tests/qemu-iotests/175.out

-- 
2.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v5 1/7] qcow2: Assert that cluster operations are aligned
  2017-02-14 19:25 [Qemu-devel] [PATCH v5 0/7] add blkdebug tests Eric Blake
@ 2017-02-14 19:25 ` Eric Blake
  2017-02-15 12:25   ` Kevin Wolf
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 2/7] qcow2: Discard/zero clusters by byte count Eric Blake
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2017-02-14 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

qcow2_discard_clusters() is set up to silently ignore sub-cluster
head or tail on unaligned requests.  However, it is easy to audit
the various callers: qcow2_snapshot_create() has always passed
aligned data since the call was introduced in 1ebf561;
qcow2_co_pdiscard() has passed aligned clusters since commit
ecdbead taught the block layer the preferred discard alignment (the
block layer can still pass sub-cluster values, but those are
handled directly in qcow2_co_pdiscard()); and qcow2_make_empty()
was fixed to pass aligned clusters in commit a3e1505.  Replace
rounding with assertions to hold us to the tighter contract,
eliminating the now-impossible case of an early exit for a
sub-cluster request.

qcow2_zero_clusters() has always been called with cluster-aligned
arguments from its lone caller qcow2_co_pwrite_zeroes() (like
qcow2_co_pdiscard(), the caller takes care of sub-cluster requests
from the block layer; and qcow2_zero_clusters() would have
misbehaved on unaligned requests), but it deserves the same
assertion for symmetry.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v5: no change
v4: new patch
---
 block/qcow2-cluster.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 928c1e2..3304a15 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1521,13 +1521,9 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,

     end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);

-    /* Round start up and end down */
-    offset = align_offset(offset, s->cluster_size);
-    end_offset = start_of_cluster(s, end_offset);
-
-    if (offset > end_offset) {
-        return 0;
-    }
+    /* Caller must pass aligned values */
+    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
+    assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size));

     nb_clusters = size_to_clusters(s, end_offset - offset);

@@ -1602,6 +1598,10 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
     uint64_t nb_clusters;
     int ret;

+    /* Caller must pass aligned values */
+    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
+    assert(QEMU_IS_ALIGNED(nb_sectors, s->cluster_size >> BDRV_SECTOR_BITS));
+
     /* The zero flag is only supported by version 3 and newer */
     if (s->qcow_version < 3) {
         return -ENOTSUP;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v5 2/7] qcow2: Discard/zero clusters by byte count
  2017-02-14 19:25 [Qemu-devel] [PATCH v5 0/7] add blkdebug tests Eric Blake
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 1/7] qcow2: Assert that cluster operations are aligned Eric Blake
@ 2017-02-14 19:25 ` Eric Blake
  2017-02-15 15:46   ` Kevin Wolf
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 3/7] blkdebug: Sanity check block layer guarantees Eric Blake
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2017-02-14 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness.  Clean up
the external interfaces to take both offset and count as bytes,
while still keeping the assertion added previously that the
caller must align the values to a cluster.  Then rename things
to make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().

The internal functions still operate on clusters at a time, and
return an int for number of cleared clusters; but on an image
with 2M clusters, a single L2 table holds 256k entries that each
represent a 2M cluster, totalling well over INT_MAX bytes if we
ever had a request for that many bytes at once.  All our callers
currently limit themselves to 32-bit bytes (and therefore fewer
clusters), but by making this function 64-bit clean, we have one
less place to clean up if we later improve the block layer to
support 64-bit bytes through all operations (with the block layer
auto-fragmenting on behalf of more-limited drivers), rather than
the current state where some interfaces are artificially limited
to INT_MAX at a time.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v5: s/count/byte/ to make the units obvious, and rework the math
to ensure no 32-bit integer overflow on large clusters
v4: improve function names, split assertion additions into earlier patch
[no v3 or v2]
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
---
 block/qcow2.h          |  9 +++++----
 block/qcow2-cluster.c  | 38 +++++++++++++++++++++-----------------
 block/qcow2-snapshot.c |  7 +++----
 block/qcow2.c          | 21 +++++++++------------
 4 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index f8aeb08..808104c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -544,10 +544,11 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                          int compressed_size);

 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-    int nb_sectors, enum qcow2_discard_type type, bool full_discard);
-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-                        int flags);
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, enum qcow2_discard_type type,
+                          bool full_discard);
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, int flags);

 int qcow2_expand_zero_clusters(BlockDriverState *bs,
                                BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3304a15..02da8f1 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1511,16 +1511,16 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
     return nb_clusters;
 }

-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-    int nb_sectors, enum qcow2_discard_type type, bool full_discard)
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, enum qcow2_discard_type type,
+                          bool full_discard)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t end_offset;
+    uint64_t end_offset = offset + bytes;
     uint64_t nb_clusters;
+    int64_t cleared;
     int ret;

-    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
-
     /* Caller must pass aligned values */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size));
@@ -1531,13 +1531,15 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,

     /* Each L2 table is handled by its own loop iteration */
     while (nb_clusters > 0) {
-        ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
-        if (ret < 0) {
+        cleared = discard_single_l2(bs, offset, nb_clusters, type,
+                                    full_discard);
+        if (cleared < 0) {
+            ret = cleared;
             goto fail;
         }

-        nb_clusters -= ret;
-        offset += (ret * s->cluster_size);
+        nb_clusters -= cleared;
+        offset += (cleared * s->cluster_size);
     }

     ret = 0;
@@ -1591,16 +1593,17 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
     return nb_clusters;
 }

-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-                        int flags)
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, int flags)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t nb_clusters;
+    int64_t cleared;
     int ret;

     /* Caller must pass aligned values */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-    assert(QEMU_IS_ALIGNED(nb_sectors, s->cluster_size >> BDRV_SECTOR_BITS));
+    assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));

     /* The zero flag is only supported by version 3 and newer */
     if (s->qcow_version < 3) {
@@ -1608,18 +1611,19 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
     }

     /* Each L2 table is handled by its own loop iteration */
-    nb_clusters = size_to_clusters(s, nb_sectors << BDRV_SECTOR_BITS);
+    nb_clusters = size_to_clusters(s, bytes);

     s->cache_discards = true;

     while (nb_clusters > 0) {
-        ret = zero_single_l2(bs, offset, nb_clusters, flags);
-        if (ret < 0) {
+        cleared = zero_single_l2(bs, offset, nb_clusters, flags);
+        if (cleared < 0) {
+            ret = cleared;
             goto fail;
         }

-        nb_clusters -= ret;
-        offset += (ret * s->cluster_size);
+        nb_clusters -= cleared;
+        offset += (cleared * s->cluster_size);
     }

     ret = 0;
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0324243..44243e0 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -440,10 +440,9 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)

     /* The VM state isn't needed any more in the active L1 table; in fact, it
      * hurts by causing expensive COW for the next snapshot. */
-    qcow2_discard_clusters(bs, qcow2_vm_state_offset(s),
-                           align_offset(sn->vm_state_size, s->cluster_size)
-                                >> BDRV_SECTOR_BITS,
-                           QCOW2_DISCARD_NEVER, false);
+    qcow2_cluster_discard(bs, qcow2_vm_state_offset(s),
+                          align_offset(sn->vm_state_size, s->cluster_size),
+                          QCOW2_DISCARD_NEVER, false);

 #ifdef DEBUG_ALLOC
     {
diff --git a/block/qcow2.c b/block/qcow2.c
index 3e274bd..7daf2ca 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2487,7 +2487,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     trace_qcow2_pwrite_zeroes(qemu_coroutine_self(), offset, count);

     /* Whatever is left can use real zero clusters */
-    ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS, flags);
+    ret = qcow2_cluster_zeroize(bs, offset, count, flags);
     qemu_co_mutex_unlock(&s->lock);

     return ret;
@@ -2505,8 +2505,8 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
     }

     qemu_co_mutex_lock(&s->lock);
-    ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
-                                 QCOW2_DISCARD_REQUEST, false);
+    ret = qcow2_cluster_discard(bs, offset, count, QCOW2_DISCARD_REQUEST,
+                                false);
     qemu_co_mutex_unlock(&s->lock);
     return ret;
 }
@@ -2808,9 +2808,8 @@ fail:
 static int qcow2_make_empty(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t start_sector;
-    int sector_step = (QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size) /
-                       BDRV_SECTOR_SIZE);
+    uint64_t offset, end_offset;
+    int step = QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size);
     int l1_clusters, ret = 0;

     l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t));
@@ -2827,18 +2826,16 @@ static int qcow2_make_empty(BlockDriverState *bs)

     /* This fallback code simply discards every active cluster; this is slow,
      * but works in all cases */
-    for (start_sector = 0; start_sector < bs->total_sectors;
-         start_sector += sector_step)
+    end_offset = bs->total_sectors * BDRV_SECTOR_SIZE;
+    for (offset = 0; offset < end_offset; offset += step)
     {
         /* As this function is generally used after committing an external
          * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
          * default action for this kind of discard is to pass the discard,
          * which will ideally result in an actually smaller image file, as
          * is probably desired. */
-        ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
-                                     MIN(sector_step,
-                                         bs->total_sectors - start_sector),
-                                     QCOW2_DISCARD_SNAPSHOT, true);
+        ret = qcow2_cluster_discard(bs, offset, MIN(step, end_offset - offset),
+                                    QCOW2_DISCARD_SNAPSHOT, true);
         if (ret < 0) {
             break;
         }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v5 3/7] blkdebug: Sanity check block layer guarantees
  2017-02-14 19:25 [Qemu-devel] [PATCH v5 0/7] add blkdebug tests Eric Blake
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 1/7] qcow2: Assert that cluster operations are aligned Eric Blake
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 2/7] qcow2: Discard/zero clusters by byte count Eric Blake
@ 2017-02-14 19:25 ` Eric Blake
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 4/7] blkdebug: Add pass-through write_zero and discard support Eric Blake
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Eric Blake @ 2017-02-14 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Commits 04ed95f4 and 1a62d0ac updated the block layer to auto-fragment
any I/O to fit within device boundaries. Additionally, when using a
minimum alignment of 4k, we want to ensure the block layer does proper
read-modify-write rather than requesting I/O on a slice of a sector.
Let's enforce that the contract is obeyed when using blkdebug.  For
now, blkdebug only allows alignment overrides, and just inherits other
limits from whatever device it is wrapping, but a future patch will
further enhance things.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v5: no change
v4: no change
v3: rebase to byte-based interfaces
v2: new patch
---
 block/blkdebug.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index acccf85..37094a2 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -438,6 +438,13 @@ blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;

+    /* Sanity check block layer guarantees */
+    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+    assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
+    if (bs->bl.max_transfer) {
+        assert(bytes <= bs->bl.max_transfer);
+    }
+
     QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
         uint64_t inject_offset = rule->options.inject.offset;

@@ -462,6 +469,13 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;

+    /* Sanity check block layer guarantees */
+    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+    assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
+    if (bs->bl.max_transfer) {
+        assert(bytes <= bs->bl.max_transfer);
+    }
+
     QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
         uint64_t inject_offset = rule->options.inject.offset;

-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v5 4/7] blkdebug: Add pass-through write_zero and discard support
  2017-02-14 19:25 [Qemu-devel] [PATCH v5 0/7] add blkdebug tests Eric Blake
                   ` (2 preceding siblings ...)
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 3/7] blkdebug: Sanity check block layer guarantees Eric Blake
@ 2017-02-14 19:25 ` Eric Blake
  2017-02-15 15:53   ` Kevin Wolf
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 5/7] blkdebug: Simplify override logic Eric Blake
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2017-02-14 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

In order to test the effects of artificial geometry constraints
on operations like write zero or discard, we first need blkdebug
to manage these actions.  It also allows us to inject errors on
those operations, just like we can for read/write/flush.

We can also test the contract promised by the block layer; namely,
if a device has specified limits on alignment or maximum size,
then those limits must be obeyed (for now, the blkdebug driver
merely inherits limits from whatever it is wrapping, but the next
patch will further enhance it to allow specific limit overrides).

This patch intentionally refuses to service requests smaller than
the requested alignments; this is because an upcoming patch adds
a qemu-iotest to prove that the block layer is correctly handling
fragmentation, but the test only works if there is a way to tell
the difference at artificial alignment boundaries when blkdebug is
using a larger-than-default alignment.  If we let the blkdebug
layer always defer to the underlying layer, which potentially has
a smaller granularity, the iotest will be thwarted.

Tested by setting up an NBD server with export 'foo', then invoking:
$ ./qemu-io
qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
qemu-io> d 0 15M
qemu-io> w -z 0 15M

Pre-patch, the server never sees the discard (it was silently
eaten by the block layer); post-patch it is passed across the
wire.  Likewise, pre-patch the write is always passed with
NBD_WRITE (with 15M of zeroes on the wire), while post-patch
it can utilize NBD_WRITE_ZEROES (for less traffic).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v5: include 2017 copyright
v4: correct error injection to respect byte range, tweak formatting
v3: rebase to byte-based read/write, improve docs on why no
partial write zero passthrough
v2: new patch
---
 block/blkdebug.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 37094a2..b2d5f7d 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -1,6 +1,7 @@
 /*
  * Block protocol for I/O error injection
  *
+ * Copyright (C) 2016-2017 Red Hat, Inc.
  * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -382,6 +383,11 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
         goto out;
     }

+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->file->bs->supported_write_flags;
+    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->file->bs->supported_zero_flags;
+
     /* Set request alignment */
     align = qemu_opt_get_size(opts, "align", 0);
     if (align < INT_MAX && is_power_of_2(align)) {
@@ -511,6 +517,84 @@ static int blkdebug_co_flush(BlockDriverState *bs)
     return bdrv_co_flush(bs->file->bs);
 }

+static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
+                                                  int64_t offset, int count,
+                                                  BdrvRequestFlags flags)
+{
+    BDRVBlkdebugState *s = bs->opaque;
+    BlkdebugRule *rule = NULL;
+    uint32_t align = MAX(bs->bl.request_alignment,
+                         bs->bl.pwrite_zeroes_alignment);
+
+    /* Only pass through requests that are larger than requested
+     * preferred alignment (so that we test the fallback to writes on
+     * unaligned portions), and check that the block layer never hands
+     * us anything crossing an alignment boundary.  */
+    if (count < align) {
+        return -ENOTSUP;
+    }
+    assert(QEMU_IS_ALIGNED(offset, align));
+    assert(QEMU_IS_ALIGNED(count, align));
+    if (bs->bl.max_pwrite_zeroes) {
+        assert(count <= bs->bl.max_pwrite_zeroes);
+    }
+
+    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
+        uint64_t inject_offset = rule->options.inject.offset;
+
+        if (inject_offset == -1 ||
+            (inject_offset >= offset && inject_offset < offset + count))
+        {
+            break;
+        }
+    }
+
+    if (rule && rule->options.inject.error) {
+        return inject_error(bs, rule);
+    }
+
+    return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
+}
+
+static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
+                                             int64_t offset, int count)
+{
+    BDRVBlkdebugState *s = bs->opaque;
+    BlkdebugRule *rule = NULL;
+    uint32_t align = bs->bl.pdiscard_alignment;
+
+    /* Only pass through requests that are larger than requested
+     * minimum alignment, and ensure that unaligned requests do not
+     * cross optimum discard boundaries. */
+    if (count < bs->bl.request_alignment) {
+        return -ENOTSUP;
+    }
+    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+    assert(QEMU_IS_ALIGNED(count, bs->bl.request_alignment));
+    if (align && count >= align) {
+        assert(QEMU_IS_ALIGNED(offset, align));
+        assert(QEMU_IS_ALIGNED(count, align));
+    }
+    if (bs->bl.max_pdiscard) {
+        assert(count <= bs->bl.max_pdiscard);
+    }
+
+    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
+        uint64_t inject_offset = rule->options.inject.offset;
+
+        if (inject_offset == -1 ||
+            (inject_offset >= offset && inject_offset < offset + count))
+        {
+            break;
+        }
+    }
+
+    if (rule && rule->options.inject.error) {
+        return inject_error(bs, rule);
+    }
+
+    return bdrv_co_pdiscard(bs->file->bs, offset, count);
+}

 static void blkdebug_close(BlockDriverState *bs)
 {
@@ -763,6 +847,8 @@ static BlockDriver bdrv_blkdebug = {
     .bdrv_co_preadv         = blkdebug_co_preadv,
     .bdrv_co_pwritev        = blkdebug_co_pwritev,
     .bdrv_co_flush_to_disk  = blkdebug_co_flush,
+    .bdrv_co_pwrite_zeroes  = blkdebug_co_pwrite_zeroes,
+    .bdrv_co_pdiscard       = blkdebug_co_pdiscard,

     .bdrv_debug_event           = blkdebug_debug_event,
     .bdrv_debug_breakpoint      = blkdebug_debug_breakpoint,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v5 5/7] blkdebug: Simplify override logic
  2017-02-14 19:25 [Qemu-devel] [PATCH v5 0/7] add blkdebug tests Eric Blake
                   ` (3 preceding siblings ...)
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 4/7] blkdebug: Add pass-through write_zero and discard support Eric Blake
@ 2017-02-14 19:25 ` Eric Blake
  2017-02-15 15:58   ` Kevin Wolf
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 6/7] blkdebug: Add ability to override unmap geometries Eric Blake
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 7/7] tests: Add coverage for recent block geometry fixes Eric Blake
  6 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2017-02-14 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Rather than store into a local variable, then copy to the struct
if the value is valid, then reporting errors otherwise, it is
simpler to just store into the struct and report errors if the
value is invalid.  This however requires that the struct store
a 64-bit number, rather than a narrower type.  Move the errno
assignment into a label that will be reused from more places in
the next patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v5: no change
v4: fix typo in commit message, move errno assignment
v3: new patch
---
 block/blkdebug.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index b2d5f7d..2996152 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -38,7 +38,7 @@
 typedef struct BDRVBlkdebugState {
     int state;
     int new_state;
-    int align;
+    uint64_t align;

     /* For blkdebug_refresh_filename() */
     char *config_file;
@@ -353,7 +353,6 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     BDRVBlkdebugState *s = bs->opaque;
     QemuOpts *opts;
     Error *local_err = NULL;
-    uint64_t align;
     int ret;

     opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
@@ -389,12 +388,9 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
         bs->file->bs->supported_zero_flags;

     /* Set request alignment */
-    align = qemu_opt_get_size(opts, "align", 0);
-    if (align < INT_MAX && is_power_of_2(align)) {
-        s->align = align;
-    } else if (align) {
-        error_setg(errp, "Invalid alignment");
-        ret = -EINVAL;
+    s->align = qemu_opt_get_size(opts, "align", 0);
+    if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
+        error_setg(errp, "Invalid alignment, align must be integer power of 2");
         goto fail_unref;
     }

@@ -402,6 +398,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     goto out;

 fail_unref:
+    ret = -EINVAL;
     bdrv_unref_child(bs, bs->file);
 out:
     if (ret < 0) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v5 6/7] blkdebug: Add ability to override unmap geometries
  2017-02-14 19:25 [Qemu-devel] [PATCH v5 0/7] add blkdebug tests Eric Blake
                   ` (4 preceding siblings ...)
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 5/7] blkdebug: Simplify override logic Eric Blake
@ 2017-02-14 19:25 ` Eric Blake
  2017-02-15 16:20   ` Kevin Wolf
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 7/7] tests: Add coverage for recent block geometry fixes Eric Blake
  6 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2017-02-14 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, Markus Armbruster

Make it easier to simulate various unusual hardware setups (for
example, recent commits 3482b9b and b8d0a98 affect the Dell
Equallogic iSCSI with its 15M preferred and maximum unmap and
write zero sizing, or b2f95fe deals with the Linux loopback
block device having a max_transfer of 64k), by allowing blkdebug
to wrap any other device with further restrictions on various
alignments.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v5: tweak docs regarding max-transfer minimum
v4: relax 512 byte minimum now that blkdebug is byte-based, fix doc typo
v3: improve legibility of bounds checking, improve docs
v2: new patch
---
 qapi/block-core.json | 25 ++++++++++++++-
 block/blkdebug.c     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 932f5bb..8556c2b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2463,6 +2463,27 @@
 # @align:           #optional required alignment for requests in bytes,
 #                   must be power of 2, or 0 for default
 #
+# @max-transfer:    #optional maximum size for I/O transfers in bytes,
+#                   must be multiple of @align and of the underlying file's
+#                   request alignment (but need not be a power of 2), or
+#                   0 for default (since 2.9)
+#
+# @opt-write-zero:  #optional preferred alignment for write zero requests
+#                   in bytes, must be multiple of @align (but need not be
+#                   a power of 2), or 0 for default (since 2.9)
+#
+# @max-write-zero:  #optional maximum size for write zero requests in bytes,
+#                   must be multiple of @align (but need not be a power of
+#                   2), or 0 for default (since 2.9)
+#
+# @opt-discard:     #optional preferred alignment for discard requests
+#                   in bytes, must be multiple of @align (but need not be
+#                   a power of 2), or 0 for default (since 2.9)
+#
+# @max-discard:     #optional maximum size for discard requests in bytes,
+#                   must be multiple of @align (but need not be a power of
+#                   2), or 0 for default (since 2.9)
+#
 # @inject-error:    #optional array of error injection descriptions
 #
 # @set-state:       #optional array of state-change descriptions
@@ -2472,7 +2493,9 @@
 { 'struct': 'BlockdevOptionsBlkdebug',
   'data': { 'image': 'BlockdevRef',
             '*config': 'str',
-            '*align': 'int',
+            '*align': 'int', '*max-transfer': 'int32',
+            '*opt-write-zero': 'int32', '*max-write-zero': 'int32',
+            '*opt-discard': 'int32', '*max-discard': 'int32',
             '*inject-error': ['BlkdebugInjectErrorOptions'],
             '*set-state': ['BlkdebugSetStateOptions'] } }

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 2996152..c2e4604 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -39,6 +39,11 @@ typedef struct BDRVBlkdebugState {
     int state;
     int new_state;
     uint64_t align;
+    uint64_t max_transfer;
+    uint64_t opt_write_zero;
+    uint64_t max_write_zero;
+    uint64_t opt_discard;
+    uint64_t max_discard;

     /* For blkdebug_refresh_filename() */
     char *config_file;
@@ -343,6 +348,31 @@ static QemuOptsList runtime_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Required alignment in bytes",
         },
+        {
+            .name = "max-transfer",
+            .type = QEMU_OPT_SIZE,
+            .help = "Maximum transfer size in bytes",
+        },
+        {
+            .name = "opt-write-zero",
+            .type = QEMU_OPT_SIZE,
+            .help = "Optimum write zero alignment in bytes",
+        },
+        {
+            .name = "max-write-zero",
+            .type = QEMU_OPT_SIZE,
+            .help = "Maximum write zero size in bytes",
+        },
+        {
+            .name = "opt-discard",
+            .type = QEMU_OPT_SIZE,
+            .help = "Optimum discard alignment in bytes",
+        },
+        {
+            .name = "max-discard",
+            .type = QEMU_OPT_SIZE,
+            .help = "Maximum discard size in bytes",
+        },
         { /* end of list */ }
     },
 };
@@ -354,6 +384,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     QemuOpts *opts;
     Error *local_err = NULL;
     int ret;
+    uint64_t align;

     opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
     qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -387,12 +418,55 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
         bs->file->bs->supported_zero_flags;

-    /* Set request alignment */
+    /* Set alignment overrides */
     s->align = qemu_opt_get_size(opts, "align", 0);
     if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
         error_setg(errp, "Invalid alignment, align must be integer power of 2");
         goto fail_unref;
     }
+    align = MAX(s->align, bs->file->bs->bl.request_alignment);
+
+    s->max_transfer = qemu_opt_get_size(opts, "max-transfer", 0);
+    if (s->max_transfer &&
+        (s->max_transfer >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->max_transfer, align))) {
+        error_setg(errp, "Invalid max-transfer, must be multiple of align");
+        goto fail_unref;
+    }
+
+    s->opt_write_zero = qemu_opt_get_size(opts, "opt-write-zero", 0);
+    if (s->opt_write_zero &&
+        (s->opt_write_zero >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->opt_write_zero, align))) {
+        error_setg(errp, "Invalid opt-write-zero, not consistent with align");
+        goto fail_unref;
+    }
+
+    s->max_write_zero = qemu_opt_get_size(opts, "max-write-zero", 0);
+    if (s->max_write_zero &&
+        (s->max_write_zero >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->max_write_zero,
+                          MAX(s->opt_write_zero, align)))) {
+        error_setg(errp, "Invalid max-write-zero, not consistent with align");
+        goto fail_unref;
+    }
+
+    s->opt_discard = qemu_opt_get_size(opts, "opt-discard", 0);
+    if (s->opt_discard &&
+        (s->opt_discard >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->opt_discard, align))) {
+        error_setg(errp, "Invalid opt-discard, not consistent with align");
+        goto fail_unref;
+    }
+
+    s->max_discard = qemu_opt_get_size(opts, "max-discard", 0);
+    if (s->max_discard &&
+        (s->max_discard >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->max_discard,
+                          MAX(s->opt_discard, align)))) {
+        error_setg(errp, "Invalid max-discard, not consistent with align");
+        goto fail_unref;
+    }

     ret = 0;
     goto out;
@@ -819,6 +893,21 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
     if (s->align) {
         bs->bl.request_alignment = s->align;
     }
+    if (s->max_transfer) {
+        bs->bl.max_transfer = s->max_transfer;
+    }
+    if (s->opt_write_zero) {
+        bs->bl.pwrite_zeroes_alignment = s->opt_write_zero;
+    }
+    if (s->max_write_zero) {
+        bs->bl.max_pwrite_zeroes = s->max_write_zero;
+    }
+    if (s->opt_discard) {
+        bs->bl.pdiscard_alignment = s->opt_discard;
+    }
+    if (s->max_discard) {
+        bs->bl.max_pdiscard = s->max_discard;
+    }
 }

 static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v5 7/7] tests: Add coverage for recent block geometry fixes
  2017-02-14 19:25 [Qemu-devel] [PATCH v5 0/7] add blkdebug tests Eric Blake
                   ` (5 preceding siblings ...)
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 6/7] blkdebug: Add ability to override unmap geometries Eric Blake
@ 2017-02-14 19:25 ` Eric Blake
  6 siblings, 0 replies; 15+ messages in thread
From: Eric Blake @ 2017-02-14 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Use blkdebug's new geometry constraints to emulate setups that
have caused recent regression fixes: write zeroes asserting
when running through a loopback block device with max-transfer
smaller than cluster size, and discard rounding away portions
of requests not aligned to preferred boundaries.  Also, add
coverage that the block layer is honoring max transfer limits.

For now, a single iotest performs all actions, with the idea
that we can add future blkdebug constraint test cases in the
same file; but it can be split into multiple iotests if we find
reason to run one portion of the test in more setups than what
are possible in the other.

For reference, the final portion of the test (checking whether
discard passes as much as possible to the lowest layers of the
stack) works as follows:

qemu-io: discard 30M at 80000001, passed to blkdebug
  blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than
blkdebug's 512 align)
  blkdebug: discard 14371328 bytes at 80000512, passed to qcow2
    qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than
qcow2's 1M align)
    qcow2: discard 13M bytes at 77M, succeeds
  blkdebug: discard 15M bytes at 90M, passed to qcow2
    qcow2: discard 15M bytes at 90M, succeeds
  blkdebug: discard 1356800 bytes at 105M, passed to qcow2
    qcow2: discard 1M at 105M, succeeds
    qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's
1M align)
  blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than
blkdebug's 512 align)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v5: rebase to master by renumbering s/173/175/
v4: clean up some comments, nicer backing file creation, more commit message
v3: make comments tied more to test at hand, rather than the
particular hardware that led to the earlier patches being tested
v2: new patch
---
 tests/qemu-iotests/175     | 114 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/175.out |  49 +++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 164 insertions(+)
 create mode 100755 tests/qemu-iotests/175
 create mode 100644 tests/qemu-iotests/175.out

diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
new file mode 100755
index 0000000..867c9ca
--- /dev/null
+++ b/tests/qemu-iotests/175
@@ -0,0 +1,114 @@
+#!/bin/bash
+#
+# Test corner cases with unusual block geometries
+#
+# Copyright (C) 2016-2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=eblake@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+
+CLUSTER_SIZE=1M
+size=128M
+options=driver=blkdebug,image.driver=qcow2
+
+echo
+echo "== setting up files =="
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+$QEMU_IO -c "write -P 11 0 $size" "$TEST_IMG.base" | _filter_qemu_io
+_make_test_img -b "$TEST_IMG.base"
+$QEMU_IO -c "write -P 22 0 $size" "$TEST_IMG" | _filter_qemu_io
+
+# Limited to 64k max-transfer
+echo
+echo "== constrained alignment and max-transfer =="
+limits=align=4k,max-transfer=64k
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "write -P 33 1000 128k" -c "read -P 33 1000 128k" | _filter_qemu_io
+
+echo
+echo "== write zero with constrained max-transfer =="
+limits=align=512,max-transfer=64k,opt-write-zero=$CLUSTER_SIZE
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "write -z 8003584 2093056" | _filter_qemu_io
+
+# non-power-of-2 write-zero/discard alignments
+echo
+echo "== non-power-of-2 write zeroes limits =="
+
+limits=align=512,opt-write-zero=15M,max-write-zero=15M,opt-discard=15M,max-discard=15M
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "write -z 32M 32M" | _filter_qemu_io
+
+echo
+echo "== non-power-of-2 discard limits =="
+
+limits=align=512,opt-write-zero=15M,max-write-zero=15M,opt-discard=15M,max-discard=15M
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "discard 80000001 30M" | _filter_qemu_io
+
+echo
+echo "== verify image content =="
+
+function verify_io()
+{
+    if ($QEMU_IMG info -f "$IMGFMT" "$TEST_IMG" |
+	    grep "compat: 0.10" > /dev/null); then
+        # For v2 images, discarded clusters are read from the backing file
+        discarded=11
+    else
+        # Discarded clusters are zeroed for v3 or later
+        discarded=0
+    fi
+
+    echo read -P 22 0 1000
+    echo read -P 33 1000 128k
+    echo read -P 22 132072 7871512
+    echo read -P 0 8003584 2093056
+    echo read -P 22 10096640 23457792
+    echo read -P 0 32M 32M
+    echo read -P 22 64M 13M
+    echo read -P $discarded 77M 29M
+    echo read -P 22 106M 22M
+}
+
+verify_io | $QEMU_IO "$TEST_IMG" | _filter_qemu_io
+
+_check_test_img
+
+# success, all done
+echo "*** done"
+status=0
diff --git a/tests/qemu-iotests/175.out b/tests/qemu-iotests/175.out
new file mode 100644
index 0000000..9ac5e33
--- /dev/null
+++ b/tests/qemu-iotests/175.out
@@ -0,0 +1,49 @@
+QA output created by 175
+
+== setting up files ==
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
+wrote 134217728/134217728 bytes at offset 0
+128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 134217728/134217728 bytes at offset 0
+128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== constrained alignment and max-transfer ==
+wrote 131072/131072 bytes at offset 1000
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 1000
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== write zero with constrained max-transfer ==
+wrote 2093056/2093056 bytes at offset 8003584
+1.996 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== non-power-of-2 write zeroes limits ==
+wrote 33554432/33554432 bytes at offset 33554432
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== non-power-of-2 discard limits ==
+discard 31457280/31457280 bytes at offset 80000001
+30 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify image content ==
+read 1000/1000 bytes at offset 0
+1000 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 1000
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 7871512/7871512 bytes at offset 132072
+7.507 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 2093056/2093056 bytes at offset 8003584
+1.996 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 23457792/23457792 bytes at offset 10096640
+22.371 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 33554432/33554432 bytes at offset 33554432
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 13631488/13631488 bytes at offset 67108864
+13 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 30408704/30408704 bytes at offset 80740352
+29 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 23068672/23068672 bytes at offset 111149056
+22 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 985b9a6..bf504d1 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -167,3 +167,4 @@
 172 auto
 173 rw auto
 174 auto
+175 rw auto quick
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/7] qcow2: Assert that cluster operations are aligned
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 1/7] qcow2: Assert that cluster operations are aligned Eric Blake
@ 2017-02-15 12:25   ` Kevin Wolf
  2017-03-06 23:34     ` Eric Blake
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Wolf @ 2017-02-15 12:25 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, mreitz

Am 14.02.2017 um 20:25 hat Eric Blake geschrieben:
> qcow2_discard_clusters() is set up to silently ignore sub-cluster
> head or tail on unaligned requests.  However, it is easy to audit
> the various callers: qcow2_snapshot_create() has always passed
> aligned data since the call was introduced in 1ebf561;
> qcow2_co_pdiscard() has passed aligned clusters since commit
> ecdbead taught the block layer the preferred discard alignment (the
> block layer can still pass sub-cluster values, but those are
> handled directly in qcow2_co_pdiscard()); and qcow2_make_empty()
> was fixed to pass aligned clusters in commit a3e1505.

I don't think this is true for the very part in the image if the image
size isn't cluster aligned:

    ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
                                 MIN(sector_step,
                                     bs->total_sectors - start_sector),
                                 QCOW2_DISCARD_SNAPSHOT, true);

sector_step is alright after commit a3e1505, but bs->total_sectors can
be unaligned.

The other callers look okay.

> Replace
> rounding with assertions to hold us to the tighter contract,
> eliminating the now-impossible case of an early exit for a
> sub-cluster request.
> 
> qcow2_zero_clusters() has always been called with cluster-aligned
> arguments from its lone caller qcow2_co_pwrite_zeroes() (like
> qcow2_co_pdiscard(), the caller takes care of sub-cluster requests
> from the block layer; and qcow2_zero_clusters() would have
> misbehaved on unaligned requests), but it deserves the same
> assertion for symmetry.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> ---
> v5: no change
> v4: new patch
> ---
>  block/qcow2-cluster.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 928c1e2..3304a15 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1521,13 +1521,9 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
> 
>      end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
> 
> -    /* Round start up and end down */
> -    offset = align_offset(offset, s->cluster_size);
> -    end_offset = start_of_cluster(s, end_offset);
> -
> -    if (offset > end_offset) {
> -        return 0;
> -    }
> +    /* Caller must pass aligned values */
> +    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> +    assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size));
> 
>      nb_clusters = size_to_clusters(s, end_offset - offset);
> 
> @@ -1602,6 +1598,10 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
>      uint64_t nb_clusters;
>      int ret;
> 
> +    /* Caller must pass aligned values */
> +    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> +    assert(QEMU_IS_ALIGNED(nb_sectors, s->cluster_size >> BDRV_SECTOR_BITS));

Maybe using s->cluster_sectors is nicer than doing the shift here. Not
too important, the code is replaced in the next patch anyway.

>      /* The zero flag is only supported by version 3 and newer */
>      if (s->qcow_version < 3) {
>          return -ENOTSUP;

Kevin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/7] qcow2: Discard/zero clusters by byte count
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 2/7] qcow2: Discard/zero clusters by byte count Eric Blake
@ 2017-02-15 15:46   ` Kevin Wolf
  0 siblings, 0 replies; 15+ messages in thread
From: Kevin Wolf @ 2017-02-15 15:46 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, mreitz

Am 14.02.2017 um 20:25 hat Eric Blake geschrieben:
> Passing a byte offset, but sector count, when we ultimately
> want to operate on cluster granularity, is madness.  Clean up
> the external interfaces to take both offset and count as bytes,
> while still keeping the assertion added previously that the
> caller must align the values to a cluster.  Then rename things
> to make sure backports don't get confused by changed units:
> instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
> we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().
> 
> The internal functions still operate on clusters at a time, and
> return an int for number of cleared clusters; but on an image
> with 2M clusters, a single L2 table holds 256k entries that each
> represent a 2M cluster, totalling well over INT_MAX bytes if we
> ever had a request for that many bytes at once.  All our callers
> currently limit themselves to 32-bit bytes (and therefore fewer
> clusters), but by making this function 64-bit clean, we have one
> less place to clean up if we later improve the block layer to
> support 64-bit bytes through all operations (with the block layer
> auto-fragmenting on behalf of more-limited drivers), rather than
> the current state where some interfaces are artificially limited
> to INT_MAX at a time.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/7] blkdebug: Add pass-through write_zero and discard support
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 4/7] blkdebug: Add pass-through write_zero and discard support Eric Blake
@ 2017-02-15 15:53   ` Kevin Wolf
  0 siblings, 0 replies; 15+ messages in thread
From: Kevin Wolf @ 2017-02-15 15:53 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, mreitz

Am 14.02.2017 um 20:25 hat Eric Blake geschrieben:
> In order to test the effects of artificial geometry constraints
> on operations like write zero or discard, we first need blkdebug
> to manage these actions.  It also allows us to inject errors on
> those operations, just like we can for read/write/flush.
> 
> We can also test the contract promised by the block layer; namely,
> if a device has specified limits on alignment or maximum size,
> then those limits must be obeyed (for now, the blkdebug driver
> merely inherits limits from whatever it is wrapping, but the next
> patch will further enhance it to allow specific limit overrides).
> 
> This patch intentionally refuses to service requests smaller than
> the requested alignments; this is because an upcoming patch adds
> a qemu-iotest to prove that the block layer is correctly handling
> fragmentation, but the test only works if there is a way to tell
> the difference at artificial alignment boundaries when blkdebug is
> using a larger-than-default alignment.  If we let the blkdebug
> layer always defer to the underlying layer, which potentially has
> a smaller granularity, the iotest will be thwarted.
> 
> Tested by setting up an NBD server with export 'foo', then invoking:
> $ ./qemu-io
> qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
> qemu-io> d 0 15M
> qemu-io> w -z 0 15M
> 
> Pre-patch, the server never sees the discard (it was silently
> eaten by the block layer); post-patch it is passed across the
> wire.  Likewise, pre-patch the write is always passed with
> NBD_WRITE (with 15M of zeroes on the wire), while post-patch
> it can utilize NBD_WRITE_ZEROES (for less traffic).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> ---
> v5: include 2017 copyright
> v4: correct error injection to respect byte range, tweak formatting
> v3: rebase to byte-based read/write, improve docs on why no
> partial write zero passthrough
> v2: new patch
> ---
>  block/blkdebug.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 86 insertions(+)
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 37094a2..b2d5f7d 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -1,6 +1,7 @@
>  /*
>   * Block protocol for I/O error injection
>   *
> + * Copyright (C) 2016-2017 Red Hat, Inc.
>   * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
>   *
>   * Permission is hereby granted, free of charge, to any person obtaining a copy
> @@ -382,6 +383,11 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
>          goto out;
>      }
> 
> +    bs->supported_write_flags = BDRV_REQ_FUA &
> +        bs->file->bs->supported_write_flags;
> +    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
> +        bs->file->bs->supported_zero_flags;
> +
>      /* Set request alignment */
>      align = qemu_opt_get_size(opts, "align", 0);
>      if (align < INT_MAX && is_power_of_2(align)) {
> @@ -511,6 +517,84 @@ static int blkdebug_co_flush(BlockDriverState *bs)
>      return bdrv_co_flush(bs->file->bs);
>  }
> 
> +static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
> +                                                  int64_t offset, int count,
> +                                                  BdrvRequestFlags flags)
> +{
> +    BDRVBlkdebugState *s = bs->opaque;
> +    BlkdebugRule *rule = NULL;
> +    uint32_t align = MAX(bs->bl.request_alignment,
> +                         bs->bl.pwrite_zeroes_alignment);
> +
> +    /* Only pass through requests that are larger than requested
> +     * preferred alignment (so that we test the fallback to writes on
> +     * unaligned portions), and check that the block layer never hands
> +     * us anything crossing an alignment boundary.  */

"crossing an alignment boundary" isn't really what we're interested in
(and also not what your code checks), but just that things are properly
aligned.

> +    if (count < align) {
> +        return -ENOTSUP;
> +    }
> +    assert(QEMU_IS_ALIGNED(offset, align));
> +    assert(QEMU_IS_ALIGNED(count, align));
> +    if (bs->bl.max_pwrite_zeroes) {
> +        assert(count <= bs->bl.max_pwrite_zeroes);
> +    }
> +
> +    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
> +        uint64_t inject_offset = rule->options.inject.offset;
> +
> +        if (inject_offset == -1 ||
> +            (inject_offset >= offset && inject_offset < offset + count))
> +        {
> +            break;
> +        }
> +    }
> +
> +    if (rule && rule->options.inject.error) {
> +        return inject_error(bs, rule);
> +    }
> +
> +    return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
> +}
> +
> +static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
> +                                             int64_t offset, int count)
> +{
> +    BDRVBlkdebugState *s = bs->opaque;
> +    BlkdebugRule *rule = NULL;
> +    uint32_t align = bs->bl.pdiscard_alignment;
> +
> +    /* Only pass through requests that are larger than requested
> +     * minimum alignment, and ensure that unaligned requests do not
> +     * cross optimum discard boundaries. */
> +    if (count < bs->bl.request_alignment) {
> +        return -ENOTSUP;
> +    }
> +    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
> +    assert(QEMU_IS_ALIGNED(count, bs->bl.request_alignment));
> +    if (align && count >= align) {
> +        assert(QEMU_IS_ALIGNED(offset, align));
> +        assert(QEMU_IS_ALIGNED(count, align));

Here, in contrast, I think you really want to do what the comment says
(because the contract is that you get head, bulk and tail of a discard
request separately), but the code fails to do so: We could have
count < align, but still cross a optimum discard alignment boundary if
offset is misaligned, too, and we have two partial accesses (i.e. head
and tail in the same call).

> +    }
> +    if (bs->bl.max_pdiscard) {
> +        assert(count <= bs->bl.max_pdiscard);
> +    }
> +
> +    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
> +        uint64_t inject_offset = rule->options.inject.offset;
> +
> +        if (inject_offset == -1 ||
> +            (inject_offset >= offset && inject_offset < offset + count))
> +        {
> +            break;
> +        }
> +    }
> +
> +    if (rule && rule->options.inject.error) {
> +        return inject_error(bs, rule);
> +    }

This piece of code is duplicated in each I/O function. Should we
consider factoring it out?

> +    return bdrv_co_pdiscard(bs->file->bs, offset, count);
> +}

Kevin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v5 5/7] blkdebug: Simplify override logic
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 5/7] blkdebug: Simplify override logic Eric Blake
@ 2017-02-15 15:58   ` Kevin Wolf
  0 siblings, 0 replies; 15+ messages in thread
From: Kevin Wolf @ 2017-02-15 15:58 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, mreitz

Am 14.02.2017 um 20:25 hat Eric Blake geschrieben:
> Rather than store into a local variable, then copy to the struct
> if the value is valid, then reporting errors otherwise, it is
> simpler to just store into the struct and report errors if the
> value is invalid.  This however requires that the struct store
> a 64-bit number, rather than a narrower type.  Move the errno
> assignment into a label that will be reused from more places in
> the next patch.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/7] blkdebug: Add ability to override unmap geometries
  2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 6/7] blkdebug: Add ability to override unmap geometries Eric Blake
@ 2017-02-15 16:20   ` Kevin Wolf
  2017-03-07 21:14     ` Eric Blake
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Wolf @ 2017-02-15 16:20 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, mreitz, Markus Armbruster

Am 14.02.2017 um 20:25 hat Eric Blake geschrieben:
> Make it easier to simulate various unusual hardware setups (for
> example, recent commits 3482b9b and b8d0a98 affect the Dell
> Equallogic iSCSI with its 15M preferred and maximum unmap and
> write zero sizing, or b2f95fe deals with the Linux loopback
> block device having a max_transfer of 64k), by allowing blkdebug
> to wrap any other device with further restrictions on various
> alignments.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v5: tweak docs regarding max-transfer minimum
> v4: relax 512 byte minimum now that blkdebug is byte-based, fix doc typo
> v3: improve legibility of bounds checking, improve docs
> v2: new patch
> ---
>  qapi/block-core.json | 25 ++++++++++++++-
>  block/blkdebug.c     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 114 insertions(+), 2 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 932f5bb..8556c2b 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2463,6 +2463,27 @@
>  # @align:           #optional required alignment for requests in bytes,
>  #                   must be power of 2, or 0 for default
>  #
> +# @max-transfer:    #optional maximum size for I/O transfers in bytes,
> +#                   must be multiple of @align and of the underlying file's
> +#                   request alignment (but need not be a power of 2), or
> +#                   0 for default (since 2.9)
> +#
> +# @opt-write-zero:  #optional preferred alignment for write zero requests
> +#                   in bytes, must be multiple of @align (but need not be
> +#                   a power of 2), or 0 for default (since 2.9)

The code doesn't only check against @align, but also against the
"underlying file's request alignment" like you specified for
@max-transfer.

The same is true for all of the following options.

> +# @max-write-zero:  #optional maximum size for write zero requests in bytes,
> +#                   must be multiple of @align (but need not be a power of
> +#                   2), or 0 for default (since 2.9)

Must be a multiple of @opt-write-zero, too.

> +#
> +# @opt-discard:     #optional preferred alignment for discard requests
> +#                   in bytes, must be multiple of @align (but need not be
> +#                   a power of 2), or 0 for default (since 2.9)
> +#
> +# @max-discard:     #optional maximum size for discard requests in bytes,
> +#                   must be multiple of @align (but need not be a power of
> +#                   2), or 0 for default (since 2.9)

And here of @opt-discard.

>  # @inject-error:    #optional array of error injection descriptions
>  #
>  # @set-state:       #optional array of state-change descriptions
> @@ -2472,7 +2493,9 @@
>  { 'struct': 'BlockdevOptionsBlkdebug',
>    'data': { 'image': 'BlockdevRef',
>              '*config': 'str',
> -            '*align': 'int',
> +            '*align': 'int', '*max-transfer': 'int32',
> +            '*opt-write-zero': 'int32', '*max-write-zero': 'int32',
> +            '*opt-discard': 'int32', '*max-discard': 'int32',

Hm, strictly speaking, this schema allows for negative values. Should we
document that they aren't allowed?

>              '*inject-error': ['BlkdebugInjectErrorOptions'],
>              '*set-state': ['BlkdebugSetStateOptions'] } }
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 2996152..c2e4604 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -39,6 +39,11 @@ typedef struct BDRVBlkdebugState {
>      int state;
>      int new_state;
>      uint64_t align;
> +    uint64_t max_transfer;
> +    uint64_t opt_write_zero;
> +    uint64_t max_write_zero;
> +    uint64_t opt_discard;
> +    uint64_t max_discard;
> 
>      /* For blkdebug_refresh_filename() */
>      char *config_file;
> @@ -343,6 +348,31 @@ static QemuOptsList runtime_opts = {
>              .type = QEMU_OPT_SIZE,
>              .help = "Required alignment in bytes",
>          },
> +        {
> +            .name = "max-transfer",
> +            .type = QEMU_OPT_SIZE,
> +            .help = "Maximum transfer size in bytes",
> +        },
> +        {
> +            .name = "opt-write-zero",
> +            .type = QEMU_OPT_SIZE,
> +            .help = "Optimum write zero alignment in bytes",
> +        },
> +        {
> +            .name = "max-write-zero",
> +            .type = QEMU_OPT_SIZE,
> +            .help = "Maximum write zero size in bytes",
> +        },
> +        {
> +            .name = "opt-discard",
> +            .type = QEMU_OPT_SIZE,
> +            .help = "Optimum discard alignment in bytes",
> +        },
> +        {
> +            .name = "max-discard",
> +            .type = QEMU_OPT_SIZE,
> +            .help = "Maximum discard size in bytes",
> +        },
>          { /* end of list */ }
>      },
>  };
> @@ -354,6 +384,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
>      QemuOpts *opts;
>      Error *local_err = NULL;
>      int ret;
> +    uint64_t align;
> 
>      opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
>      qemu_opts_absorb_qdict(opts, options, &local_err);
> @@ -387,12 +418,55 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
>      bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
>          bs->file->bs->supported_zero_flags;
> 
> -    /* Set request alignment */
> +    /* Set alignment overrides */
>      s->align = qemu_opt_get_size(opts, "align", 0);
>      if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
>          error_setg(errp, "Invalid alignment, align must be integer power of 2");
>          goto fail_unref;
>      }
> +    align = MAX(s->align, bs->file->bs->bl.request_alignment);
> +
> +    s->max_transfer = qemu_opt_get_size(opts, "max-transfer", 0);
> +    if (s->max_transfer &&
> +        (s->max_transfer >= INT_MAX ||
> +         !QEMU_IS_ALIGNED(s->max_transfer, align))) {
> +        error_setg(errp, "Invalid max-transfer, must be multiple of align");

It's not that I'm generally a friend of unspecific messages, but in this
case I think I would prefer being unspecific to the potentially wrong
error message that is returned here. We're checking multiple conditions
and the error message mentions only one of them as the reason, which may
or may not be the right one.

This is the same for all new options.

> +        goto fail_unref;
> +    }
> +
> +    s->opt_write_zero = qemu_opt_get_size(opts, "opt-write-zero", 0);
> +    if (s->opt_write_zero &&
> +        (s->opt_write_zero >= INT_MAX ||
> +         !QEMU_IS_ALIGNED(s->opt_write_zero, align))) {
> +        error_setg(errp, "Invalid opt-write-zero, not consistent with align");
> +        goto fail_unref;
> +    }
> +
> +    s->max_write_zero = qemu_opt_get_size(opts, "max-write-zero", 0);
> +    if (s->max_write_zero &&
> +        (s->max_write_zero >= INT_MAX ||
> +         !QEMU_IS_ALIGNED(s->max_write_zero,
> +                          MAX(s->opt_write_zero, align)))) {
> +        error_setg(errp, "Invalid max-write-zero, not consistent with align");
> +        goto fail_unref;
> +    }
> +
> +    s->opt_discard = qemu_opt_get_size(opts, "opt-discard", 0);
> +    if (s->opt_discard &&
> +        (s->opt_discard >= INT_MAX ||
> +         !QEMU_IS_ALIGNED(s->opt_discard, align))) {
> +        error_setg(errp, "Invalid opt-discard, not consistent with align");
> +        goto fail_unref;
> +    }
> +
> +    s->max_discard = qemu_opt_get_size(opts, "max-discard", 0);
> +    if (s->max_discard &&
> +        (s->max_discard >= INT_MAX ||
> +         !QEMU_IS_ALIGNED(s->max_discard,
> +                          MAX(s->opt_discard, align)))) {
> +        error_setg(errp, "Invalid max-discard, not consistent with align");
> +        goto fail_unref;
> +    }

Kevin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/7] qcow2: Assert that cluster operations are aligned
  2017-02-15 12:25   ` Kevin Wolf
@ 2017-03-06 23:34     ` Eric Blake
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Blake @ 2017-03-06 23:34 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, mreitz

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

On 02/15/2017 06:25 AM, Kevin Wolf wrote:
> Am 14.02.2017 um 20:25 hat Eric Blake geschrieben:
>> qcow2_discard_clusters() is set up to silently ignore sub-cluster
>> head or tail on unaligned requests.  However, it is easy to audit
>> the various callers: qcow2_snapshot_create() has always passed
>> aligned data since the call was introduced in 1ebf561;
>> qcow2_co_pdiscard() has passed aligned clusters since commit
>> ecdbead taught the block layer the preferred discard alignment (the
>> block layer can still pass sub-cluster values, but those are
>> handled directly in qcow2_co_pdiscard()); and qcow2_make_empty()
>> was fixed to pass aligned clusters in commit a3e1505.
> 
> I don't think this is true for the very part in the image if the image
> size isn't cluster aligned:
> 
>     ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
>                                  MIN(sector_step,
>                                      bs->total_sectors - start_sector),
>                                  QCOW2_DISCARD_SNAPSHOT, true);
> 
> sector_step is alright after commit a3e1505, but bs->total_sectors can
> be unaligned.

Indeed; a slight tweak to qemu-iotests/097 exposes the problem.  I'll
include that in my next spin (although I note that Dan has patches on
the list to split 097 into two tests, in order to give qcow coverage
rather than just qcow2).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/7] blkdebug: Add ability to override unmap geometries
  2017-02-15 16:20   ` Kevin Wolf
@ 2017-03-07 21:14     ` Eric Blake
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Blake @ 2017-03-07 21:14 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, mreitz, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 2766 bytes --]

On 02/15/2017 10:20 AM, Kevin Wolf wrote:
> Am 14.02.2017 um 20:25 hat Eric Blake geschrieben:
>> Make it easier to simulate various unusual hardware setups (for
>> example, recent commits 3482b9b and b8d0a98 affect the Dell
>> Equallogic iSCSI with its 15M preferred and maximum unmap and
>> write zero sizing, or b2f95fe deals with the Linux loopback
>> block device having a max_transfer of 64k), by allowing blkdebug
>> to wrap any other device with further restrictions on various
>> alignments.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>>  # @inject-error:    #optional array of error injection descriptions
>>  #
>>  # @set-state:       #optional array of state-change descriptions
>> @@ -2472,7 +2493,9 @@
>>  { 'struct': 'BlockdevOptionsBlkdebug',
>>    'data': { 'image': 'BlockdevRef',
>>              '*config': 'str',
>> -            '*align': 'int',
>> +            '*align': 'int', '*max-transfer': 'int32',
>> +            '*opt-write-zero': 'int32', '*max-write-zero': 'int32',
>> +            '*opt-discard': 'int32', '*max-discard': 'int32',
> 
> Hm, strictly speaking, this schema allows for negative values. Should we
> document that they aren't allowed?

Doesn't power of two imply positive? But yes, I can tweak the wording.


>> -    /* Set request alignment */
>> +    /* Set alignment overrides */
>>      s->align = qemu_opt_get_size(opts, "align", 0);
>>      if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
>>          error_setg(errp, "Invalid alignment, align must be integer power of 2");
>>          goto fail_unref;
>>      }
>> +    align = MAX(s->align, bs->file->bs->bl.request_alignment);
>> +
>> +    s->max_transfer = qemu_opt_get_size(opts, "max-transfer", 0);
>> +    if (s->max_transfer &&
>> +        (s->max_transfer >= INT_MAX ||
>> +         !QEMU_IS_ALIGNED(s->max_transfer, align))) {
>> +        error_setg(errp, "Invalid max-transfer, must be multiple of align");
> 
> It's not that I'm generally a friend of unspecific messages, but in this
> case I think I would prefer being unspecific to the potentially wrong
> error message that is returned here. We're checking multiple conditions
> and the error message mentions only one of them as the reason, which may
> or may not be the right one.
> 
> This is the same for all new options.

I'm leaning towards "Cannot meet constraints with max-transfer %lld" as
being non-specific enough to cover all the cases while still describing
the problem to be fixed.  Any other wording suggestions would be
welcome, if someone has a particular bikeshed color they like.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-03-07 21:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-14 19:25 [Qemu-devel] [PATCH v5 0/7] add blkdebug tests Eric Blake
2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 1/7] qcow2: Assert that cluster operations are aligned Eric Blake
2017-02-15 12:25   ` Kevin Wolf
2017-03-06 23:34     ` Eric Blake
2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 2/7] qcow2: Discard/zero clusters by byte count Eric Blake
2017-02-15 15:46   ` Kevin Wolf
2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 3/7] blkdebug: Sanity check block layer guarantees Eric Blake
2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 4/7] blkdebug: Add pass-through write_zero and discard support Eric Blake
2017-02-15 15:53   ` Kevin Wolf
2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 5/7] blkdebug: Simplify override logic Eric Blake
2017-02-15 15:58   ` Kevin Wolf
2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 6/7] blkdebug: Add ability to override unmap geometries Eric Blake
2017-02-15 16:20   ` Kevin Wolf
2017-03-07 21:14     ` Eric Blake
2017-02-14 19:25 ` [Qemu-devel] [PATCH v5 7/7] tests: Add coverage for recent block geometry fixes Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.