[Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing
@ 2018-11-12  7:06 Marc Olson
  2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types Marc Olson
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Marc Olson @ 2018-11-12  7:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: jsnow, qemu-block, Marc Olson, Kevin Wolf, Max Reitz

If 'once' is specified, the rule should execute just once, regardless if
it is supposed to return an error or not. Take the example where you
want the first IO to an LBA to succeed, but subsequent IOs to fail. You
could either use state transitions, or create two rules, one with
error = 0 and once set to true, and one with a non-zero error.

Signed-off-by: Marc Olson <marcolso@amazon.com>
---
 block/blkdebug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 0759452..327049b 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -488,7 +488,7 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
         }
     }
 
-    if (!rule || !rule->options.inject.error) {
+    if (!rule) {
         return 0;
     }
 
@@ -500,7 +500,7 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
         remove_rule(rule);
     }
 
-    if (!immediately) {
+    if (error && !immediately) {
         aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
         qemu_coroutine_yield();
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types
  2018-11-12  7:06 [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing Marc Olson
@ 2018-11-12  7:06 ` Marc Olson
  2018-11-13 23:22   ` John Snow
  2019-01-11 14:41   ` Max Reitz
  2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type Marc Olson
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Marc Olson @ 2018-11-12  7:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: jsnow, qemu-block, Marc Olson, Kevin Wolf, Max Reitz

Break out the more common parts of the BlkdebugRule struct, and make
rule_check() more explicit about operating only on error injection types
so that additional rule types can be added in the future.

Signed-off-by: Marc Olson <marcolso@amazon.com>
---
 block/blkdebug.c | 59 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 31 insertions(+), 28 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 327049b..7739849 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -73,13 +73,13 @@ typedef struct BlkdebugRule {
     BlkdebugEvent event;
     int action;
     int state;
+    int once;
+    int64_t offset;
     union {
         struct {
             int error;
             int immediately;
-            int once;
-            int64_t offset;
-        } inject;
+        } inject_error;
         struct {
             int new_state;
         } set_state;
@@ -182,16 +182,16 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp)
         .state  = qemu_opt_get_number(opts, "state", 0),
     };
 
+    rule->once = qemu_opt_get_bool(opts, "once", 0);
+    sector = qemu_opt_get_number(opts, "sector", -1);
+    rule->offset = sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
+
     /* Parse action-specific options */
     switch (d->action) {
     case ACTION_INJECT_ERROR:
-        rule->options.inject.error = qemu_opt_get_number(opts, "errno", EIO);
-        rule->options.inject.once  = qemu_opt_get_bool(opts, "once", 0);
-        rule->options.inject.immediately =
+        rule->options.inject_error.error = qemu_opt_get_number(opts, "errno", EIO);
+        rule->options.inject_error.immediately =
             qemu_opt_get_bool(opts, "immediately", 0);
-        sector = qemu_opt_get_number(opts, "sector", -1);
-        rule->options.inject.offset =
-            sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
         break;
 
     case ACTION_SET_STATE:
@@ -474,38 +474,41 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
 {
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;
+    BlkdebugRule *error_rule = NULL;
     int error;
     bool immediately;
+    int ret = 0;
 
     QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
-        uint64_t inject_offset = rule->options.inject.offset;
-
-        if (inject_offset == -1 ||
-            (bytes && inject_offset >= offset &&
-             inject_offset < offset + bytes))
+        if (rule->offset == -1 ||
+            (bytes && rule->offset >= offset &&
+             rule->offset < offset + bytes))
         {
-            break;
+            if (rule->action == ACTION_INJECT_ERROR) {
+                error_rule = rule;
+                break;
+            }
         }
     }
 
-    if (!rule) {
-        return 0;
-    }
+    if (error_rule) {
+        immediately = error_rule->options.inject_error.immediately;
+        error = error_rule->options.inject_error.error;
 
-    immediately = rule->options.inject.immediately;
-    error = rule->options.inject.error;
+        if (error_rule->once) {
+            QSIMPLEQ_REMOVE(&s->active_rules, error_rule, BlkdebugRule, active_next);
+            remove_rule(error_rule);
+        }
 
-    if (rule->options.inject.once) {
-        QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next);
-        remove_rule(rule);
-    }
+        if (error && !immediately) {
+            aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
+            qemu_coroutine_yield();
+        }
 
-    if (error && !immediately) {
-        aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
-        qemu_coroutine_yield();
+        ret = -error;
     }
 
-    return -error;
+    return ret;
 }
 
 static int coroutine_fn
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type
  2018-11-12  7:06 [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing Marc Olson
  2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types Marc Olson
@ 2018-11-12  7:06 ` Marc Olson
  2018-11-13 23:57   ` John Snow
  2019-01-11 15:00   ` Max Reitz
  2018-11-12 11:15 ` [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing Dongli Zhang
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Marc Olson @ 2018-11-12  7:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: jsnow, qemu-block, Marc Olson, Kevin Wolf, Max Reitz, Eric Blake,
	Markus Armbruster

Add a new rule type for blkdebug that instead of returning an error, can
inject latency to an IO.

Signed-off-by: Marc Olson <marcolso@amazon.com>
---
 block/blkdebug.c           | 79 +++++++++++++++++++++++++++++++++++++++++++---
 docs/devel/blkdebug.txt    | 35 ++++++++++++++------
 qapi/block-core.json       | 31 ++++++++++++++++++
 tests/qemu-iotests/071     | 63 ++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/071.out | 31 ++++++++++++++++++
 5 files changed, 226 insertions(+), 13 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 7739849..6b1f2d6 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -65,6 +65,7 @@ typedef struct BlkdebugSuspendedReq {
 
 enum {
     ACTION_INJECT_ERROR,
+    ACTION_INJECT_DELAY,
     ACTION_SET_STATE,
     ACTION_SUSPEND,
 };
@@ -81,6 +82,9 @@ typedef struct BlkdebugRule {
             int immediately;
         } inject_error;
         struct {
+            int64_t latency;
+        } delay;
+        struct {
             int new_state;
         } set_state;
         struct {
@@ -123,6 +127,34 @@ static QemuOptsList inject_error_opts = {
     },
 };
 
+static QemuOptsList inject_delay_opts = {
+    .name = "inject-delay",
+    .head = QTAILQ_HEAD_INITIALIZER(inject_delay_opts.head),
+    .desc = {
+        {
+            .name = "event",
+            .type = QEMU_OPT_STRING,
+        },
+        {
+            .name = "state",
+            .type = QEMU_OPT_NUMBER,
+        },
+        {
+            .name = "latency",
+            .type = QEMU_OPT_NUMBER,
+        },
+        {
+            .name = "sector",
+            .type = QEMU_OPT_NUMBER,
+        },
+        {
+            .name = "once",
+            .type = QEMU_OPT_BOOL,
+        },
+        { /* end of list */ }
+    },
+};
+
 static QemuOptsList set_state_opts = {
     .name = "set-state",
     .head = QTAILQ_HEAD_INITIALIZER(set_state_opts.head),
@@ -145,6 +177,7 @@ static QemuOptsList set_state_opts = {
 
 static QemuOptsList *config_groups[] = {
     &inject_error_opts,
+    &inject_delay_opts,
     &set_state_opts,
     NULL
 };
@@ -194,6 +227,11 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp)
             qemu_opt_get_bool(opts, "immediately", 0);
         break;
 
+    case ACTION_INJECT_DELAY:
+        rule->options.delay.latency =
+            qemu_opt_get_number(opts, "latency", 100) * SCALE_US;
+        break;
+
     case ACTION_SET_STATE:
         rule->options.set_state.new_state =
             qemu_opt_get_number(opts, "new_state", 0);
@@ -226,6 +264,12 @@ static void remove_rule(BlkdebugRule *rule)
     g_free(rule);
 }
 
+static void remove_active_rule(BDRVBlkdebugState *s, BlkdebugRule *rule)
+{
+    QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next);
+    remove_rule(rule);
+}
+
 static int read_config(BDRVBlkdebugState *s, const char *filename,
                        QDict *options, Error **errp)
 {
@@ -264,6 +308,14 @@ static int read_config(BDRVBlkdebugState *s, const char *filename,
         goto fail;
     }
 
+    d.action = ACTION_INJECT_DELAY;
+    qemu_opts_foreach(&inject_delay_opts, add_rule, &d, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        ret = -EINVAL;
+        goto fail;
+    }
+
     d.action = ACTION_SET_STATE;
     qemu_opts_foreach(&set_state_opts, add_rule, &d, &local_err);
     if (local_err) {
@@ -275,6 +327,7 @@ static int read_config(BDRVBlkdebugState *s, const char *filename,
     ret = 0;
 fail:
     qemu_opts_reset(&inject_error_opts);
+    qemu_opts_reset(&inject_delay_opts);
     qemu_opts_reset(&set_state_opts);
     if (f) {
         fclose(f);
@@ -474,7 +527,8 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
 {
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;
-    BlkdebugRule *error_rule = NULL;
+    BlkdebugRule *error_rule = NULL, *delay_rule = NULL;
+    int64_t latency;
     int error;
     bool immediately;
     int ret = 0;
@@ -484,20 +538,36 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
             (bytes && rule->offset >= offset &&
              rule->offset < offset + bytes))
         {
-            if (rule->action == ACTION_INJECT_ERROR) {
+            if (!error_rule && rule->action == ACTION_INJECT_ERROR) {
                 error_rule = rule;
+            } else if (!delay_rule && rule->action == ACTION_INJECT_DELAY) {
+                delay_rule = rule;
+            }
+
+            if (error_rule && delay_rule) {
                 break;
             }
         }
     }
 
+    if (delay_rule) {
+        latency = delay_rule->options.delay.latency;
+
+        if (delay_rule->once) {
+            remove_active_rule(s, delay_rule);
+        }
+
+        if (latency != 0) {
+            qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, latency);
+        }
+    }
+
     if (error_rule) {
         immediately = error_rule->options.inject_error.immediately;
         error = error_rule->options.inject_error.error;
 
         if (error_rule->once) {
-            QSIMPLEQ_REMOVE(&s->active_rules, error_rule, BlkdebugRule, active_next);
-            remove_rule(error_rule);
+            remove_active_rule(s, error_rule);
         }
 
         if (error && !immediately) {
@@ -697,6 +767,7 @@ static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
     /* Take the action */
     switch (rule->action) {
     case ACTION_INJECT_ERROR:
+    case ACTION_INJECT_DELAY:
         if (!injected) {
             QSIMPLEQ_INIT(&s->active_rules);
             injected = true;
diff --git a/docs/devel/blkdebug.txt b/docs/devel/blkdebug.txt
index 43d8e8f..1719835 100644
--- a/docs/devel/blkdebug.txt
+++ b/docs/devel/blkdebug.txt
@@ -24,7 +24,7 @@ This way, all error paths can be tested to make sure they are correct.
 Rules
 -----
 The blkdebug block driver takes a list of "rules" that tell the error injection
-engine when to fail an I/O request.
+engine when to either fail or add latency to an I/O request.
 
 Each I/O request is evaluated against the rules.  If a rule matches the request
 then its "action" is executed.
@@ -33,24 +33,35 @@ Rules can be placed in a configuration file; the configuration file
 follows the same .ini-like format used by QEMU's -readconfig option, and
 each section of the file represents a rule.
 
-The following configuration file defines a single rule:
+The following configuration file defines multiple rules:
 
   $ cat blkdebug.conf
   [inject-error]
   event = "read_aio"
   errno = "28"
 
-This rule fails all aio read requests with ENOSPC (28).  Note that the errno
-value depends on the host.  On Linux, see
+  [inject-delay]
+  event = "read_aio"
+  sector = "2048"
+  latency = "500000"
+
+The error rule fails all aio read requests with ENOSPC (28).  Note that the
+errno value depends on the host.  On Linux, see
 /usr/include/asm-generic/errno-base.h for errno values.
 
+The delay rule adds 500 ms of latency to a read I/O request containing sector
+2048.
+
+An error rule and a delay rule can overlap, and both will execute. Only one
+rule of a given type will be executed for each I/O.
+
 Invoke QEMU as follows:
 
   $ qemu-system-x86_64
         -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
         -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
 
-Rules support the following attributes:
+All rules support the following attributes:
 
   event - which type of operation to match (e.g. read_aio, write_aio,
           flush_to_os, flush_to_disk).  See the "Events" section for
@@ -60,21 +71,27 @@ Rules support the following attributes:
           rule to match.  See the "State transitions" section for information
           on states.
 
-  errno - the numeric errno value to return when a request matches this rule.
-          The errno values depend on the host since the numeric values are not
-          standarized in the POSIX specification.
-
   sector - (optional) a sector number that the request must overlap in order to
            match this rule
 
   once - (optional, default "off") only execute this action on the first
          matching request
 
+Error injection rules support the following additional attributes:
+
+  errno - the numeric errno value to return when a request matches this rule.
+          The errno values depend on the host since the numeric values are not
+          standarized in the POSIX specification.
+
   immediately - (optional, default "off") return a NULL BlockAIOCB
                 pointer and fail without an errno instead.  This
                 exercises the code path where BlockAIOCB fails and the
                 caller's BlockCompletionFunc is not invoked.
 
+Delay rules support the following additional attribute:
+
+  latency - the delay to add to an I/O request, in microseconds.
+
 Events
 ------
 Block drivers provide information about the type of I/O request they are about
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d4fe710..72f7861 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3057,6 +3057,34 @@
             '*immediately': 'bool' } }
 
 ##
+# @BlkdebugDelayOptions:
+#
+# Describes a single latency injection for blkdebug.
+#
+# @event:       trigger event
+#
+# @state:       the state identifier blkdebug needs to be in to
+#               actually trigger the event; defaults to "any"
+#
+# @latency:     The delay to add to an I/O, in microseconds.
+#
+# @sector:      specifies the sector index which has to be affected
+#               in order to actually trigger the event; defaults to "any
+#               sector"
+#
+# @once:        disables further events after this one has been
+#               triggered; defaults to false
+#
+# Since: 3.1
+##
+{ 'struct': 'BlkdebugDelayOptions',
+  'data': { 'event': 'BlkdebugEvent',
+            '*state': 'int',
+            '*latency': 'int',
+            '*sector': 'int',
+            '*once': 'bool' } }
+
+##
 # @BlkdebugSetStateOptions:
 #
 # Describes a single state-change event for blkdebug.
@@ -3115,6 +3143,8 @@
 #
 # @inject-error:    array of error injection descriptions
 #
+# @inject-delay:    array of delay injection descriptions
+#
 # @set-state:       array of state-change descriptions
 #
 # Since: 2.9
@@ -3126,6 +3156,7 @@
             '*opt-write-zero': 'int32', '*max-write-zero': 'int32',
             '*opt-discard': 'int32', '*max-discard': 'int32',
             '*inject-error': ['BlkdebugInjectErrorOptions'],
+            '*inject-delay': ['BlkdebugDelayOptions'],
             '*set-state': ['BlkdebugSetStateOptions'] } }
 
 ##
diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
index 48b4955..976f747 100755
--- a/tests/qemu-iotests/071
+++ b/tests/qemu-iotests/071
@@ -100,6 +100,69 @@ $QEMU_IO -c "open -o driver=$IMGFMT,file.driver=blkdebug,file.inject-error.event
          -c 'read -P 42 0x38000 512'
 
 echo
+echo "=== Testing blkdebug latency through filename ==="
+echo
+
+$QEMU_IO -c "open -o file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000 $TEST_IMG" \
+         -c 'aio_write -P 42 0x28000 512' \
+         -c 'aio_read -P 42 0x38000 512' \
+         | _filter_qemu_io
+
+echo
+echo "=== Testing blkdebug latency through file blockref ==="
+echo
+
+$QEMU_IO -c "open -o driver=$IMGFMT,file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000,file.image.filename=$TEST_IMG" \
+         -c 'aio_write -P 42 0x28000 512' \
+         -c 'aio_read -P 42 0x38000 512' \
+         | _filter_qemu_io
+
+# Using QMP is synchronous by default, so even though we would
+# expect reordering due to using the aio_* commands, they are
+# not. The purpose of this test is to verify that the driver
+# can be setup via QMP, and IO can complete. See the qemu-io
+# test above to prove delay functionality
+echo
+echo "=== Testing blkdebug on existing block device ==="
+echo
+
+run_qemu <<EOF
+{ "execute": "qmp_capabilities" }
+{ "execute": "blockdev-add",
+    "arguments": {
+        "node-name": "drive0",
+        "driver": "file",
+        "filename": "$TEST_IMG"
+    }
+}
+{ "execute": "blockdev-add",
+    "arguments": {
+        "driver": "$IMGFMT",
+        "node-name": "drive0-debug",
+        "file": {
+            "driver": "blkdebug",
+            "image": "drive0",
+            "inject-delay": [{
+                "event": "write_aio",
+                "latency": 10000
+            }]
+        }
+    }
+}
+{ "execute": "human-monitor-command",
+    "arguments": {
+        "command-line": 'qemu-io drive0-debug "aio_write 0 512"'
+    }
+}
+{ "execute": "human-monitor-command",
+    "arguments": {
+        "command-line": 'qemu-io drive0-debug "aio_read 0 512"'
+    }
+}
+{ "execute": "quit" }
+EOF
+
+echo
 echo "=== Testing blkdebug on existing block device ==="
 echo
 
diff --git a/tests/qemu-iotests/071.out b/tests/qemu-iotests/071.out
index 1d5e28d..1952990 100644
--- a/tests/qemu-iotests/071.out
+++ b/tests/qemu-iotests/071.out
@@ -36,6 +36,37 @@ read failed: Input/output error
 
 read failed: Input/output error
 
+=== Testing blkdebug latency through filename ===
+
+read 512/512 bytes at offset 229376
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 163840
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing blkdebug latency through file blockref ===
+
+read 512/512 bytes at offset 229376
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 163840
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing blkdebug on existing block device ===
+
+Testing:
+QMP_VERSION
+{"return": {}}
+{"return": {}}
+{"return": {}}
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+{"return": ""}
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+{"return": ""}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
+
+
 === Testing blkdebug on existing block device ===
 
 Testing:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing
  2018-11-12  7:06 [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing Marc Olson
  2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types Marc Olson
  2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type Marc Olson
@ 2018-11-12 11:15 ` Dongli Zhang
  2018-11-13 23:00 ` John Snow
  2019-01-11 14:36 ` Max Reitz
  4 siblings, 0 replies; 17+ messages in thread
From: Dongli Zhang @ 2018-11-12 11:15 UTC (permalink / raw)
  To: Marc Olson; +Cc: qemu-devel, Kevin Wolf, jsnow, qemu-block, Max Reitz

Hi Marc,

When I play with the v3 patch set, the qemu hangs again and I need to kill it
with "kill -9".

I got below from guest:

[  104.828127] nvme nvme0: I/O 52 QID 1 timeout, aborting
[  104.828470] nvme nvme0: Abort status: 0x4001

nvme abort is not supported by qemu and therefore 0x4001 (NVME_INVALID_OPCODE |
NVME_DNR) is returned. qemu does not restart the device.


Below is the environment:

host: most recent qemu (master branch) with 3 patches applied, on top of Ubuntu
16.04.4 with 4.15.0-36-generic

guest: Ubuntu 16.04.4 with 4.13.0-39-generic.


The image to emulate nvme is a 128MB raw image (ext4). I randomly run "dd
if=/dev/zero of=output bs=1M count=30" on the nvme raw image to hit the
sector="40960" with "inject-delay" enabled as shown below:

[inject-delay]
event = "write_aio"
latency = "9999999999"
sector = "40960"


sudo ./x86_64-softmmu/qemu-system-x86_64 \
-drive file=os.img,format=raw,if=none,id=disk0 \
-device
virtio-blk-pci,drive=disk0,id=device0,num-queues=2,iothread=io1,bootindex=0 \
-object iothread,id=io1 \
-smp 2 -m 2000M -enable-kvm  -vnc :0 -monitor stdio \
-device nvme,drive=nvmedrive,serial=deadbeaf1 \
-drive file=blkdebug:blkdebug.config:nvme.img,format=raw,if=none,id=nvmedrive \
-net nic -net user,hostfwd=tcp::5022-:22

I will debug where it hangs.

Dongli Zhang


On 11/12/2018 03:06 PM, Marc Olson via Qemu-devel wrote:
> If 'once' is specified, the rule should execute just once, regardless if
> it is supposed to return an error or not. Take the example where you
> want the first IO to an LBA to succeed, but subsequent IOs to fail. You
> could either use state transitions, or create two rules, one with
> error = 0 and once set to true, and one with a non-zero error.
> 
> Signed-off-by: Marc Olson <marcolso@amazon.com>
> ---
>  block/blkdebug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 0759452..327049b 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -488,7 +488,7 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
>          }
>      }
>  
> -    if (!rule || !rule->options.inject.error) {
> +    if (!rule) {
>          return 0;
>      }
>  
> @@ -500,7 +500,7 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
>          remove_rule(rule);
>      }
>  
> -    if (!immediately) {
> +    if (error && !immediately) {
>          aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
>          qemu_coroutine_yield();
>      }
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing
  2018-11-12  7:06 [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing Marc Olson
                   ` (2 preceding siblings ...)
  2018-11-12 11:15 ` [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing Dongli Zhang
@ 2018-11-13 23:00 ` John Snow
  2019-01-11 14:36 ` Max Reitz
  4 siblings, 0 replies; 17+ messages in thread
From: John Snow @ 2018-11-13 23:00 UTC (permalink / raw)
  To: Marc Olson, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz



On 11/12/18 2:06 AM, Marc Olson via Qemu-devel wrote:
> If 'once' is specified, the rule should execute just once, regardless if
> it is supposed to return an error or not. Take the example where you
> want the first IO to an LBA to succeed, but subsequent IOs to fail. You
> could either use state transitions, or create two rules, one with
> error = 0 and once set to true, and one with a non-zero error.
> 
> Signed-off-by: Marc Olson <marcolso@amazon.com>
> ---
>  block/blkdebug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 0759452..327049b 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -488,7 +488,7 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
>          }
>      }
>  
> -    if (!rule || !rule->options.inject.error) {
> +    if (!rule) {
>          return 0;
>      }
>  

This gets rid of the early return so that later we check to see if
'once' was set and remove the rule, regardless of if it did anything or not,

> @@ -500,7 +500,7 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
>          remove_rule(rule);
>      }
>  
> -    if (!immediately) {
> +    if (error && !immediately) {

And then we modify this to only trigger if we have an error to inject.

>          aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
>          qemu_coroutine_yield();
>      }
> 

[down here, we return -error, but that should still be zero.]


This changes the mechanism of 'once' slightly, but only when errno was
set to zero. I'm not sure we make use of that anywhere, so I think this
should be a safe change. Certainly we don't stipulate that we only
respect once if you bothered to set errno to a non-zero value.

I thiink this is probably fine.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types
  2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types Marc Olson
@ 2018-11-13 23:22   ` John Snow
  2018-11-13 23:34     ` Marc Olson
  2019-01-11 14:41   ` Max Reitz
  1 sibling, 1 reply; 17+ messages in thread
From: John Snow @ 2018-11-13 23:22 UTC (permalink / raw)
  To: Marc Olson, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz



On 11/12/18 2:06 AM, Marc Olson via Qemu-devel wrote:
> Break out the more common parts of the BlkdebugRule struct, and make
> rule_check() more explicit about operating only on error injection types
> so that additional rule types can be added in the future.
> 
> Signed-off-by: Marc Olson <marcolso@amazon.com>
> ---
>  block/blkdebug.c | 59 +++++++++++++++++++++++++++++---------------------------
>  1 file changed, 31 insertions(+), 28 deletions(-)
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 327049b..7739849 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -73,13 +73,13 @@ typedef struct BlkdebugRule {
>      BlkdebugEvent event;
>      int action;
>      int state;
> +    int once;
> +    int64_t offset;
>      union {
>          struct {
>              int error;
>              int immediately;
> -            int once;
> -            int64_t offset;
> -        } inject;
> +        } inject_error;

...pulling out "once" and "offset" from inject_error (renamed inject) to
shared properties. Fine, though this looks like it could use more love.
Not your doing.

This adds new dead fields for set_state and suspend which will now work,
but hopefully not do anything.

>          struct {
>              int new_state;
>          } set_state;
> @@ -182,16 +182,16 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp)
>          .state  = qemu_opt_get_number(opts, "state", 0),
>      };
>  
> +    rule->once = qemu_opt_get_bool(opts, "once", 0);
> +    sector = qemu_opt_get_number(opts, "sector", -1);
> +    rule->offset = sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
> +
>      /* Parse action-specific options */
>      switch (d->action) {
>      case ACTION_INJECT_ERROR:
> -        rule->options.inject.error = qemu_opt_get_number(opts, "errno", EIO);
> -        rule->options.inject.once  = qemu_opt_get_bool(opts, "once", 0);
> -        rule->options.inject.immediately =
> +        rule->options.inject_error.error = qemu_opt_get_number(opts, "errno", EIO);
> +        rule->options.inject_error.immediately =
>              qemu_opt_get_bool(opts, "immediately", 0);
> -        sector = qemu_opt_get_number(opts, "sector", -1);
> -        rule->options.inject.offset =
> -            sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
>          break;
>  
>      case ACTION_SET_STATE:
> @@ -474,38 +474,41 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
>  {
>      BDRVBlkdebugState *s = bs->opaque;
>      BlkdebugRule *rule = NULL;
> +    BlkdebugRule *error_rule = NULL;
>      int error;
>      bool immediately;
> +    int ret = 0;
>  
>      QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
> -        uint64_t inject_offset = rule->options.inject.offset;
> -
> -        if (inject_offset == -1 ||
> -            (bytes && inject_offset >= offset &&
> -             inject_offset < offset + bytes))
> +        if (rule->offset == -1 ||
> +            (bytes && rule->offset >= offset &&
> +             rule->offset < offset + bytes))
>          {
> -            break;
> +            if (rule->action == ACTION_INJECT_ERROR) {
> +                error_rule = rule;
> +                break;
> +            }
>          }
>      }
>  
> -    if (!rule) {
> -        return 0;
> -    }
> +    if (error_rule) {
> +        immediately = error_rule->options.inject_error.immediately;
> +        error = error_rule->options.inject_error.error;
>  
> -    immediately = rule->options.inject.immediately;
> -    error = rule->options.inject.error;
> +        if (error_rule->once) {
> +            QSIMPLEQ_REMOVE(&s->active_rules, error_rule, BlkdebugRule, active_next);
> +            remove_rule(error_rule);
> +        }
>  
> -    if (rule->options.inject.once) {
> -        QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next);
> -        remove_rule(rule);
> -    }
> +        if (error && !immediately) {
> +            aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
> +            qemu_coroutine_yield();
> +        }
>  
> -    if (error && !immediately) {
> -        aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
> -        qemu_coroutine_yield();
> +        ret = -error;
>      }

Bit messy as a diff, but it seems to check out. As a bonus we now
actually check the tag of the rules we're iterating through, so that
seems like an improvement.

Reviewed-By: John Snow <jsnow@redhat.com>

>  
> -    return -error;
> +    return ret;
>  }
>  
>  static int coroutine_fn
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types
  2018-11-13 23:22   ` John Snow
@ 2018-11-13 23:34     ` Marc Olson
  2018-11-13 23:38       ` John Snow
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Olson @ 2018-11-13 23:34 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz

On 11/13/18 3:22 PM, John Snow wrote:
>
> On 11/12/18 2:06 AM, Marc Olson via Qemu-devel wrote:
>> Break out the more common parts of the BlkdebugRule struct, and make
>> rule_check() more explicit about operating only on error injection types
>> so that additional rule types can be added in the future.
>>
>> Signed-off-by: Marc Olson <marcolso@amazon.com>
>> ---
>>   block/blkdebug.c | 59 +++++++++++++++++++++++++++++---------------------------
>>   1 file changed, 31 insertions(+), 28 deletions(-)
>>
>> diff --git a/block/blkdebug.c b/block/blkdebug.c
>> index 327049b..7739849 100644
>> --- a/block/blkdebug.c
>> +++ b/block/blkdebug.c
>> @@ -73,13 +73,13 @@ typedef struct BlkdebugRule {
>>       BlkdebugEvent event;
>>       int action;
>>       int state;
>> +    int once;
>> +    int64_t offset;
>>       union {
>>           struct {
>>               int error;
>>               int immediately;
>> -            int once;
>> -            int64_t offset;
>> -        } inject;
>> +        } inject_error;
> ...pulling out "once" and "offset" from inject_error (renamed inject) to
> shared properties. Fine, though this looks like it could use more love.
> Not your doing.
>
> This adds new dead fields for set_state and suspend which will now work,
> but hopefully not do anything.


I think set_state was already there?

>
>>           struct {
>>               int new_state;
>>           } set_state;
>> @@ -182,16 +182,16 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp)
>>           .state  = qemu_opt_get_number(opts, "state", 0),
>>       };
>>   
>> +    rule->once = qemu_opt_get_bool(opts, "once", 0);
>> +    sector = qemu_opt_get_number(opts, "sector", -1);
>> +    rule->offset = sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
>> +
>>       /* Parse action-specific options */
>>       switch (d->action) {
>>       case ACTION_INJECT_ERROR:
>> -        rule->options.inject.error = qemu_opt_get_number(opts, "errno", EIO);
>> -        rule->options.inject.once  = qemu_opt_get_bool(opts, "once", 0);
>> -        rule->options.inject.immediately =
>> +        rule->options.inject_error.error = qemu_opt_get_number(opts, "errno", EIO);
>> +        rule->options.inject_error.immediately =
>>               qemu_opt_get_bool(opts, "immediately", 0);
>> -        sector = qemu_opt_get_number(opts, "sector", -1);
>> -        rule->options.inject.offset =
>> -            sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
>>           break;
>>   
>>       case ACTION_SET_STATE:
>> @@ -474,38 +474,41 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
>>   {
>>       BDRVBlkdebugState *s = bs->opaque;
>>       BlkdebugRule *rule = NULL;
>> +    BlkdebugRule *error_rule = NULL;
>>       int error;
>>       bool immediately;
>> +    int ret = 0;
>>   
>>       QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
>> -        uint64_t inject_offset = rule->options.inject.offset;
>> -
>> -        if (inject_offset == -1 ||
>> -            (bytes && inject_offset >= offset &&
>> -             inject_offset < offset + bytes))
>> +        if (rule->offset == -1 ||
>> +            (bytes && rule->offset >= offset &&
>> +             rule->offset < offset + bytes))
>>           {
>> -            break;
>> +            if (rule->action == ACTION_INJECT_ERROR) {
>> +                error_rule = rule;
>> +                break;
>> +            }
>>           }
>>       }
>>   
>> -    if (!rule) {
>> -        return 0;
>> -    }
>> +    if (error_rule) {
>> +        immediately = error_rule->options.inject_error.immediately;
>> +        error = error_rule->options.inject_error.error;
>>   
>> -    immediately = rule->options.inject.immediately;
>> -    error = rule->options.inject.error;
>> +        if (error_rule->once) {
>> +            QSIMPLEQ_REMOVE(&s->active_rules, error_rule, BlkdebugRule, active_next);
>> +            remove_rule(error_rule);
>> +        }
>>   
>> -    if (rule->options.inject.once) {
>> -        QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next);
>> -        remove_rule(rule);
>> -    }
>> +        if (error && !immediately) {
>> +            aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
>> +            qemu_coroutine_yield();
>> +        }
>>   
>> -    if (error && !immediately) {
>> -        aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
>> -        qemu_coroutine_yield();
>> +        ret = -error;
>>       }
> Bit messy as a diff, but it seems to check out. As a bonus we now
> actually check the tag of the rules we're iterating through, so that
> seems like an improvement.


Unfortunately git made a bit of a mess out of the diff.

>
> Reviewed-By: John Snow <jsnow@redhat.com>
>
>>   
>> -    return -error;
>> +    return ret;
>>   }
>>   
>>   static int coroutine_fn
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types
  2018-11-13 23:34     ` Marc Olson
@ 2018-11-13 23:38       ` John Snow
  0 siblings, 0 replies; 17+ messages in thread
From: John Snow @ 2018-11-13 23:38 UTC (permalink / raw)
  To: Marc Olson, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz



On 11/13/18 6:34 PM, Marc Olson wrote:
> On 11/13/18 3:22 PM, John Snow wrote:
>>
>> On 11/12/18 2:06 AM, Marc Olson via Qemu-devel wrote:
>>> Break out the more common parts of the BlkdebugRule struct, and make
>>> rule_check() more explicit about operating only on error injection types
>>> so that additional rule types can be added in the future.
>>>
>>> Signed-off-by: Marc Olson <marcolso@amazon.com>
>>> ---
>>>   block/blkdebug.c | 59
>>> +++++++++++++++++++++++++++++---------------------------
>>>   1 file changed, 31 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/block/blkdebug.c b/block/blkdebug.c
>>> index 327049b..7739849 100644
>>> --- a/block/blkdebug.c
>>> +++ b/block/blkdebug.c
>>> @@ -73,13 +73,13 @@ typedef struct BlkdebugRule {
>>>       BlkdebugEvent event;
>>>       int action;
>>>       int state;
>>> +    int once;
>>> +    int64_t offset;
>>>       union {
>>>           struct {
>>>               int error;
>>>               int immediately;
>>> -            int once;
>>> -            int64_t offset;
>>> -        } inject;
>>> +        } inject_error;
>> ...pulling out "once" and "offset" from inject_error (renamed inject) to
>> shared properties. Fine, though this looks like it could use more love.
>> Not your doing.
>>
>> This adds new dead fields for set_state and suspend which will now work,
>> but hopefully not do anything.
> 
> 
> I think set_state was already there?
> 

Yes, I just mean to say that "once" and "offset" now get defined for
set_state tagged rules, and could theoretically be specified by the user
(I think?) I don't think it will change anything. Just pointing it out
in case anyone knows better than I do.

>>
>>>           struct {
>>>               int new_state;
>>>           } set_state;
>>> @@ -182,16 +182,16 @@ static int add_rule(void *opaque, QemuOpts
>>> *opts, Error **errp)
>>>           .state  = qemu_opt_get_number(opts, "state", 0),
>>>       };
>>>   +    rule->once = qemu_opt_get_bool(opts, "once", 0);
>>> +    sector = qemu_opt_get_number(opts, "sector", -1);
>>> +    rule->offset = sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
>>> +
>>>       /* Parse action-specific options */
>>>       switch (d->action) {
>>>       case ACTION_INJECT_ERROR:
>>> -        rule->options.inject.error = qemu_opt_get_number(opts,
>>> "errno", EIO);
>>> -        rule->options.inject.once  = qemu_opt_get_bool(opts, "once",
>>> 0);
>>> -        rule->options.inject.immediately =
>>> +        rule->options.inject_error.error = qemu_opt_get_number(opts,
>>> "errno", EIO);
>>> +        rule->options.inject_error.immediately =
>>>               qemu_opt_get_bool(opts, "immediately", 0);
>>> -        sector = qemu_opt_get_number(opts, "sector", -1);
>>> -        rule->options.inject.offset =
>>> -            sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
>>>           break;
>>>         case ACTION_SET_STATE:
>>> @@ -474,38 +474,41 @@ static int rule_check(BlockDriverState *bs,
>>> uint64_t offset, uint64_t bytes)
>>>   {
>>>       BDRVBlkdebugState *s = bs->opaque;
>>>       BlkdebugRule *rule = NULL;
>>> +    BlkdebugRule *error_rule = NULL;
>>>       int error;
>>>       bool immediately;
>>> +    int ret = 0;
>>>         QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
>>> -        uint64_t inject_offset = rule->options.inject.offset;
>>> -
>>> -        if (inject_offset == -1 ||
>>> -            (bytes && inject_offset >= offset &&
>>> -             inject_offset < offset + bytes))
>>> +        if (rule->offset == -1 ||
>>> +            (bytes && rule->offset >= offset &&
>>> +             rule->offset < offset + bytes))
>>>           {
>>> -            break;
>>> +            if (rule->action == ACTION_INJECT_ERROR) {
>>> +                error_rule = rule;
>>> +                break;
>>> +            }
>>>           }
>>>       }
>>>   -    if (!rule) {
>>> -        return 0;
>>> -    }
>>> +    if (error_rule) {
>>> +        immediately = error_rule->options.inject_error.immediately;
>>> +        error = error_rule->options.inject_error.error;
>>>   -    immediately = rule->options.inject.immediately;
>>> -    error = rule->options.inject.error;
>>> +        if (error_rule->once) {
>>> +            QSIMPLEQ_REMOVE(&s->active_rules, error_rule,
>>> BlkdebugRule, active_next);
>>> +            remove_rule(error_rule);
>>> +        }
>>>   -    if (rule->options.inject.once) {
>>> -        QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule,
>>> active_next);
>>> -        remove_rule(rule);
>>> -    }
>>> +        if (error && !immediately) {
>>> +            aio_co_schedule(qemu_get_current_aio_context(),
>>> qemu_coroutine_self());
>>> +            qemu_coroutine_yield();
>>> +        }
>>>   -    if (error && !immediately) {
>>> -        aio_co_schedule(qemu_get_current_aio_context(),
>>> qemu_coroutine_self());
>>> -        qemu_coroutine_yield();
>>> +        ret = -error;
>>>       }
>> Bit messy as a diff, but it seems to check out. As a bonus we now
>> actually check the tag of the rules we're iterating through, so that
>> seems like an improvement.
> 
> 
> Unfortunately git made a bit of a mess out of the diff.
> 
>>
>> Reviewed-By: John Snow <jsnow@redhat.com>
>>
>>>   -    return -error;
>>> +    return ret;
>>>   }
>>>     static int coroutine_fn
>>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type
  2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type Marc Olson
@ 2018-11-13 23:57   ` John Snow
  2019-01-11 15:00   ` Max Reitz
  1 sibling, 0 replies; 17+ messages in thread
From: John Snow @ 2018-11-13 23:57 UTC (permalink / raw)
  To: Marc Olson, qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, Max Reitz, Eric Blake



On 11/12/18 2:06 AM, Marc Olson via Qemu-devel wrote:
> Add a new rule type for blkdebug that instead of returning an error, can
> inject latency to an IO.
> 
> Signed-off-by: Marc Olson <marcolso@amazon.com>
> ---
>  block/blkdebug.c           | 79 +++++++++++++++++++++++++++++++++++++++++++---
>  docs/devel/blkdebug.txt    | 35 ++++++++++++++------
>  qapi/block-core.json       | 31 ++++++++++++++++++
>  tests/qemu-iotests/071     | 63 ++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/071.out | 31 ++++++++++++++++++
>  5 files changed, 226 insertions(+), 13 deletions(-)
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 7739849..6b1f2d6 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -65,6 +65,7 @@ typedef struct BlkdebugSuspendedReq {
>  
>  enum {
>      ACTION_INJECT_ERROR,
> +    ACTION_INJECT_DELAY,
>      ACTION_SET_STATE,
>      ACTION_SUSPEND,
>  };
> @@ -81,6 +82,9 @@ typedef struct BlkdebugRule {
>              int immediately;
>          } inject_error;
>          struct {
> +            int64_t latency;
> +        } delay;
> +        struct {
>              int new_state;
>          } set_state;
>          struct {
> @@ -123,6 +127,34 @@ static QemuOptsList inject_error_opts = {
>      },
>  };
>  
> +static QemuOptsList inject_delay_opts = {
> +    .name = "inject-delay",
> +    .head = QTAILQ_HEAD_INITIALIZER(inject_delay_opts.head),
> +    .desc = {
> +        {
> +            .name = "event",
> +            .type = QEMU_OPT_STRING,
> +        },
> +        {
> +            .name = "state",
> +            .type = QEMU_OPT_NUMBER,
> +        },
> +        {
> +            .name = "latency",
> +            .type = QEMU_OPT_NUMBER,
> +        },
> +        {
> +            .name = "sector",
> +            .type = QEMU_OPT_NUMBER,
> +        },
> +        {
> +            .name = "once",
> +            .type = QEMU_OPT_BOOL,
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +

Lot of redundancy again, but ... it's just a debugging interface, so...

>  static QemuOptsList set_state_opts = {
>      .name = "set-state",
>      .head = QTAILQ_HEAD_INITIALIZER(set_state_opts.head),
> @@ -145,6 +177,7 @@ static QemuOptsList set_state_opts = {
>  
>  static QemuOptsList *config_groups[] = {
>      &inject_error_opts,
> +    &inject_delay_opts,
>      &set_state_opts,
>      NULL
>  };
> @@ -194,6 +227,11 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp)
>              qemu_opt_get_bool(opts, "immediately", 0);
>          break;
>  
> +    case ACTION_INJECT_DELAY:
> +        rule->options.delay.latency =
> +            qemu_opt_get_number(opts, "latency", 100) * SCALE_US;
> +        break;
> +
>      case ACTION_SET_STATE:
>          rule->options.set_state.new_state =
>              qemu_opt_get_number(opts, "new_state", 0);
> @@ -226,6 +264,12 @@ static void remove_rule(BlkdebugRule *rule)
>      g_free(rule);
>  }
>  
> +static void remove_active_rule(BDRVBlkdebugState *s, BlkdebugRule *rule)
> +{
> +    QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next);
> +    remove_rule(rule);
> +}
> +
>  static int read_config(BDRVBlkdebugState *s, const char *filename,
>                         QDict *options, Error **errp)
>  {
> @@ -264,6 +308,14 @@ static int read_config(BDRVBlkdebugState *s, const char *filename,
>          goto fail;
>      }
>  
> +    d.action = ACTION_INJECT_DELAY;
> +    qemu_opts_foreach(&inject_delay_opts, add_rule, &d, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
>      d.action = ACTION_SET_STATE;
>      qemu_opts_foreach(&set_state_opts, add_rule, &d, &local_err);
>      if (local_err) {
> @@ -275,6 +327,7 @@ static int read_config(BDRVBlkdebugState *s, const char *filename,
>      ret = 0;
>  fail:
>      qemu_opts_reset(&inject_error_opts);
> +    qemu_opts_reset(&inject_delay_opts);
>      qemu_opts_reset(&set_state_opts);
>      if (f) {
>          fclose(f);
> @@ -474,7 +527,8 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
>  {
>      BDRVBlkdebugState *s = bs->opaque;
>      BlkdebugRule *rule = NULL;
> -    BlkdebugRule *error_rule = NULL;
> +    BlkdebugRule *error_rule = NULL, *delay_rule = NULL;
> +    int64_t latency;
>      int error;
>      bool immediately;
>      int ret = 0;
> @@ -484,20 +538,36 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
>              (bytes && rule->offset >= offset &&
>               rule->offset < offset + bytes))
>          {
> -            if (rule->action == ACTION_INJECT_ERROR) {
> +            if (!error_rule && rule->action == ACTION_INJECT_ERROR) {
>                  error_rule = rule;
> +            } else if (!delay_rule && rule->action == ACTION_INJECT_DELAY) {
> +                delay_rule = rule;
> +            }
> +
> +            if (error_rule && delay_rule) {
>                  break;
>              }
>          }
>      }
>  
> +    if (delay_rule) {
> +        latency = delay_rule->options.delay.latency;
> +
> +        if (delay_rule->once) {
> +            remove_active_rule(s, delay_rule);
> +        }
> +
> +        if (latency != 0) {
> +            qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, latency);
> +        }
> +    }
> +
>      if (error_rule) {
>          immediately = error_rule->options.inject_error.immediately;
>          error = error_rule->options.inject_error.error;
>  
>          if (error_rule->once) {
> -            QSIMPLEQ_REMOVE(&s->active_rules, error_rule, BlkdebugRule, active_next);
> -            remove_rule(error_rule);
> +            remove_active_rule(s, error_rule);
>          }
>  
>          if (error && !immediately) {
> @@ -697,6 +767,7 @@ static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
>      /* Take the action */
>      switch (rule->action) {
>      case ACTION_INJECT_ERROR:
> +    case ACTION_INJECT_DELAY:
>          if (!injected) {
>              QSIMPLEQ_INIT(&s->active_rules);
>              injected = true;
> diff --git a/docs/devel/blkdebug.txt b/docs/devel/blkdebug.txt
> index 43d8e8f..1719835 100644
> --- a/docs/devel/blkdebug.txt
> +++ b/docs/devel/blkdebug.txt
> @@ -24,7 +24,7 @@ This way, all error paths can be tested to make sure they are correct.
>  Rules
>  -----
>  The blkdebug block driver takes a list of "rules" that tell the error injection
> -engine when to fail an I/O request.
> +engine when to either fail or add latency to an I/O request.
>  
>  Each I/O request is evaluated against the rules.  If a rule matches the request
>  then its "action" is executed.
> @@ -33,24 +33,35 @@ Rules can be placed in a configuration file; the configuration file
>  follows the same .ini-like format used by QEMU's -readconfig option, and
>  each section of the file represents a rule.
>  
> -The following configuration file defines a single rule:
> +The following configuration file defines multiple rules:
>  
>    $ cat blkdebug.conf
>    [inject-error]
>    event = "read_aio"
>    errno = "28"
>  
> -This rule fails all aio read requests with ENOSPC (28).  Note that the errno
> -value depends on the host.  On Linux, see
> +  [inject-delay]
> +  event = "read_aio"
> +  sector = "2048"
> +  latency = "500000"
> +
> +The error rule fails all aio read requests with ENOSPC (28).  Note that the
> +errno value depends on the host.  On Linux, see
>  /usr/include/asm-generic/errno-base.h for errno values.
>  
> +The delay rule adds 500 ms of latency to a read I/O request containing sector
> +2048.
> +
> +An error rule and a delay rule can overlap, and both will execute. Only one
> +rule of a given type will be executed for each I/O.
> +
>  Invoke QEMU as follows:
>  
>    $ qemu-system-x86_64
>          -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
>          -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
>  
> -Rules support the following attributes:
> +All rules support the following attributes:
>  
>    event - which type of operation to match (e.g. read_aio, write_aio,
>            flush_to_os, flush_to_disk).  See the "Events" section for
> @@ -60,21 +71,27 @@ Rules support the following attributes:
>            rule to match.  See the "State transitions" section for information
>            on states.
>  
> -  errno - the numeric errno value to return when a request matches this rule.
> -          The errno values depend on the host since the numeric values are not
> -          standarized in the POSIX specification.
> -
>    sector - (optional) a sector number that the request must overlap in order to
>             match this rule
>  
>    once - (optional, default "off") only execute this action on the first
>           matching request
>  
> +Error injection rules support the following additional attributes:
> +
> +  errno - the numeric errno value to return when a request matches this rule.
> +          The errno values depend on the host since the numeric values are not
> +          standarized in the POSIX specification.
> +
>    immediately - (optional, default "off") return a NULL BlockAIOCB
>                  pointer and fail without an errno instead.  This
>                  exercises the code path where BlockAIOCB fails and the
>                  caller's BlockCompletionFunc is not invoked.
>  
> +Delay rules support the following additional attribute:
> +
> +  latency - the delay to add to an I/O request, in microseconds.
> +
>  Events
>  ------
>  Block drivers provide information about the type of I/O request they are about
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index d4fe710..72f7861 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3057,6 +3057,34 @@
>              '*immediately': 'bool' } }
>  
>  ##
> +# @BlkdebugDelayOptions:
> +#
> +# Describes a single latency injection for blkdebug.
> +#
> +# @event:       trigger event
> +#
> +# @state:       the state identifier blkdebug needs to be in to
> +#               actually trigger the event; defaults to "any"
> +#
> +# @latency:     The delay to add to an I/O, in microseconds.
> +#
> +# @sector:      specifies the sector index which has to be affected
> +#               in order to actually trigger the event; defaults to "any
> +#               sector"
> +#
> +# @once:        disables further events after this one has been
> +#               triggered; defaults to false
> +#
> +# Since: 3.1

Probably 3.2 at this point, sorry!

> +##
> +{ 'struct': 'BlkdebugDelayOptions',
> +  'data': { 'event': 'BlkdebugEvent',
> +            '*state': 'int',
> +            '*latency': 'int',
> +            '*sector': 'int',
> +            '*once': 'bool' } }
> +
> +##

Seems fine mechanically. Not sure if the QAPI lords will care about the
redundancy or not.

>  # @BlkdebugSetStateOptions:
>  #
>  # Describes a single state-change event for blkdebug.
> @@ -3115,6 +3143,8 @@
>  #
>  # @inject-error:    array of error injection descriptions
>  #
> +# @inject-delay:    array of delay injection descriptions
> +#
>  # @set-state:       array of state-change descriptions
>  #
>  # Since: 2.9
> @@ -3126,6 +3156,7 @@
>              '*opt-write-zero': 'int32', '*max-write-zero': 'int32',
>              '*opt-discard': 'int32', '*max-discard': 'int32',
>              '*inject-error': ['BlkdebugInjectErrorOptions'],
> +            '*inject-delay': ['BlkdebugDelayOptions'],
>              '*set-state': ['BlkdebugSetStateOptions'] } }
>  
>  ##
> diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
> index 48b4955..976f747 100755
> --- a/tests/qemu-iotests/071
> +++ b/tests/qemu-iotests/071
> @@ -100,6 +100,69 @@ $QEMU_IO -c "open -o driver=$IMGFMT,file.driver=blkdebug,file.inject-error.event
>           -c 'read -P 42 0x38000 512'
>  
>  echo
> +echo "=== Testing blkdebug latency through filename ==="
> +echo
> +
> +$QEMU_IO -c "open -o file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000 $TEST_IMG" \
> +         -c 'aio_write -P 42 0x28000 512' \
> +         -c 'aio_read -P 42 0x38000 512' \
> +         | _filter_qemu_io
> +
> +echo
> +echo "=== Testing blkdebug latency through file blockref ==="
> +echo
> +
> +$QEMU_IO -c "open -o driver=$IMGFMT,file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000,file.image.filename=$TEST_IMG" \
> +         -c 'aio_write -P 42 0x28000 512' \
> +         -c 'aio_read -P 42 0x38000 512' \
> +         | _filter_qemu_io
> +
> +# Using QMP is synchronous by default, so even though we would
> +# expect reordering due to using the aio_* commands, they are
> +# not. The purpose of this test is to verify that the driver
> +# can be setup via QMP, and IO can complete. See the qemu-io
> +# test above to prove delay functionality
> +echo
> +echo "=== Testing blkdebug on existing block device ==="
> +echo
> +
> +run_qemu <<EOF
> +{ "execute": "qmp_capabilities" }
> +{ "execute": "blockdev-add",
> +    "arguments": {
> +        "node-name": "drive0",
> +        "driver": "file",
> +        "filename": "$TEST_IMG"
> +    }
> +}
> +{ "execute": "blockdev-add",
> +    "arguments": {
> +        "driver": "$IMGFMT",
> +        "node-name": "drive0-debug",
> +        "file": {
> +            "driver": "blkdebug",
> +            "image": "drive0",
> +            "inject-delay": [{
> +                "event": "write_aio",
> +                "latency": 10000
> +            }]
> +        }
> +    }
> +}
> +{ "execute": "human-monitor-command",
> +    "arguments": {
> +        "command-line": 'qemu-io drive0-debug "aio_write 0 512"'
> +    }
> +}
> +{ "execute": "human-monitor-command",
> +    "arguments": {
> +        "command-line": 'qemu-io drive0-debug "aio_read 0 512"'
> +    }
> +}
> +{ "execute": "quit" }
> +EOF
> +
> +echo
>  echo "=== Testing blkdebug on existing block device ==="
>  echo
>  
> diff --git a/tests/qemu-iotests/071.out b/tests/qemu-iotests/071.out
> index 1d5e28d..1952990 100644
> --- a/tests/qemu-iotests/071.out
> +++ b/tests/qemu-iotests/071.out
> @@ -36,6 +36,37 @@ read failed: Input/output error
>  
>  read failed: Input/output error
>  
> +=== Testing blkdebug latency through filename ===
> +
> +read 512/512 bytes at offset 229376
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 163840
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
> +=== Testing blkdebug latency through file blockref ===
> +
> +read 512/512 bytes at offset 229376
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 163840
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
> +=== Testing blkdebug on existing block device ===
> +
> +Testing:
> +QMP_VERSION
> +{"return": {}}
> +{"return": {}}
> +{"return": {}}
> +wrote 512/512 bytes at offset 0
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +{"return": ""}
> +read 512/512 bytes at offset 0
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +{"return": ""}
> +{"return": {}}
> +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
> +
> +
>  === Testing blkdebug on existing block device ===
>  
>  Testing:
> 

The code loop got a little messier and I'm worried it won't scale well,
but mechanically it seems alright.

I'll let Kevin and Max whine louder if needed :)

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing
  2018-11-12  7:06 [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing Marc Olson
                   ` (3 preceding siblings ...)
  2018-11-13 23:00 ` John Snow
@ 2019-01-11 14:36 ` Max Reitz
  4 siblings, 0 replies; 17+ messages in thread
From: Max Reitz @ 2019-01-11 14:36 UTC (permalink / raw)
  To: Marc Olson, qemu-devel; +Cc: jsnow, qemu-block, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 754 bytes --]

On 12.11.18 08:06, Marc Olson wrote:
> If 'once' is specified, the rule should execute just once, regardless if
> it is supposed to return an error or not. Take the example where you
> want the first IO to an LBA to succeed, but subsequent IOs to fail. You
> could either use state transitions, or create two rules, one with
> error = 0 and once set to true, and one with a non-zero error.
> 
> Signed-off-by: Marc Olson <marcolso@amazon.com>
> ---
>  block/blkdebug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

I don't quite see the point, because the example in the commit message
clearly should be done with a state transition, but the code change
looks reasonable, so:

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types
  2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types Marc Olson
  2018-11-13 23:22   ` John Snow
@ 2019-01-11 14:41   ` Max Reitz
  1 sibling, 0 replies; 17+ messages in thread
From: Max Reitz @ 2019-01-11 14:41 UTC (permalink / raw)
  To: Marc Olson, qemu-devel; +Cc: jsnow, qemu-block, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

On 12.11.18 08:06, Marc Olson wrote:
> Break out the more common parts of the BlkdebugRule struct, and make
> rule_check() more explicit about operating only on error injection types
> so that additional rule types can be added in the future.
> 
> Signed-off-by: Marc Olson <marcolso@amazon.com>
> ---
>  block/blkdebug.c | 59 +++++++++++++++++++++++++++++---------------------------
>  1 file changed, 31 insertions(+), 28 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type
  2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type Marc Olson
  2018-11-13 23:57   ` John Snow
@ 2019-01-11 15:00   ` Max Reitz
  2019-02-12 21:21     ` Marc Olson
  1 sibling, 1 reply; 17+ messages in thread
From: Max Reitz @ 2019-01-11 15:00 UTC (permalink / raw)
  To: Marc Olson, qemu-devel
  Cc: jsnow, qemu-block, Kevin Wolf, Eric Blake, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 7487 bytes --]

On 12.11.18 08:06, Marc Olson wrote:
> Add a new rule type for blkdebug that instead of returning an error, can
> inject latency to an IO.
> 
> Signed-off-by: Marc Olson <marcolso@amazon.com>
> ---
>  block/blkdebug.c           | 79 +++++++++++++++++++++++++++++++++++++++++++---
>  docs/devel/blkdebug.txt    | 35 ++++++++++++++------
>  qapi/block-core.json       | 31 ++++++++++++++++++
>  tests/qemu-iotests/071     | 63 ++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/071.out | 31 ++++++++++++++++++
>  5 files changed, 226 insertions(+), 13 deletions(-)
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 7739849..6b1f2d6 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c

[...]

> @@ -194,6 +227,11 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp)
>              qemu_opt_get_bool(opts, "immediately", 0);
>          break;
>  
> +    case ACTION_INJECT_DELAY:
> +        rule->options.delay.latency =
> +            qemu_opt_get_number(opts, "latency", 100) * SCALE_US;

Why the default of 100?  I think it would be best if this option were
mandatory.

> +        break;
> +
>      case ACTION_SET_STATE:
>          rule->options.set_state.new_state =
>              qemu_opt_get_number(opts, "new_state", 0);

[...]

> @@ -484,20 +538,36 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
>              (bytes && rule->offset >= offset &&
>               rule->offset < offset + bytes))
>          {
> -            if (rule->action == ACTION_INJECT_ERROR) {
> +            if (!error_rule && rule->action == ACTION_INJECT_ERROR) {
>                  error_rule = rule;
> +            } else if (!delay_rule && rule->action == ACTION_INJECT_DELAY) {
> +                delay_rule = rule;
> +            }
> +

How about handling "once" here?

(by adding:

else {
    continue;
}

if (rule->once) {
    remove_active_rule(s, rule);
}

and making the QSIMPLEQ_FOREACH a QSIMPLEQ_FOREACH_SAFE.

(Or maybe in that case we want to inline remove_active_rule().))

It isn't like the state here is too bad, but if we can't handle "once"
in a common code path, I'm torn whether it has a place in the
BlkdebugRule root.  (Doing that makes parsing a bit easier, but OTOH we
just shouldn't parse it for set-state at all, so I'd keep it in the
"unionized structs" if it isn't handled in a common code path.)

> +            if (error_rule && delay_rule) {
>                  break;
>              }
>          }
>      }
>  
> +    if (delay_rule) {
> +        latency = delay_rule->options.delay.latency;
> +
> +        if (delay_rule->once) {
> +            remove_active_rule(s, delay_rule);
> +        }
> +
> +        if (latency != 0) {
> +            qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, latency);
> +        }
> +    }
> +
>      if (error_rule) {
>          immediately = error_rule->options.inject_error.immediately;
>          error = error_rule->options.inject_error.error;
>  
>          if (error_rule->once) {
> -            QSIMPLEQ_REMOVE(&s->active_rules, error_rule, BlkdebugRule, active_next);
> -            remove_rule(error_rule);
> +            remove_active_rule(s, error_rule);
>          }
>  
>          if (error && !immediately) {

[...]

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index d4fe710..72f7861 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3057,6 +3057,34 @@
>              '*immediately': 'bool' } }
>  
>  ##
> +# @BlkdebugDelayOptions:
> +#
> +# Describes a single latency injection for blkdebug.
> +#
> +# @event:       trigger event
> +#
> +# @state:       the state identifier blkdebug needs to be in to
> +#               actually trigger the event; defaults to "any"
> +#
> +# @latency:     The delay to add to an I/O, in microseconds.
> +#
> +# @sector:      specifies the sector index which has to be affected
> +#               in order to actually trigger the event; defaults to "any
> +#               sector"
> +#
> +# @once:        disables further events after this one has been
> +#               triggered; defaults to false
> +#
> +# Since: 3.1

Well, 4.0 now, sorry...

> +##
> +{ 'struct': 'BlkdebugDelayOptions',
> +  'data': { 'event': 'BlkdebugEvent',
> +            '*state': 'int',
> +            '*latency': 'int',

I'd make this option mandatory.

> +            '*sector': 'int',
> +            '*once': 'bool' } }
> +
> +##
>  # @BlkdebugSetStateOptions:
>  #
>  # Describes a single state-change event for blkdebug.
> @@ -3115,6 +3143,8 @@
>  #
>  # @inject-error:    array of error injection descriptions
>  #
> +# @inject-delay:    array of delay injection descriptions

This needs a "(Since 4.0)".

> +#
>  # @set-state:       array of state-change descriptions
>  #
>  # Since: 2.9

[...]

> diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
> index 48b4955..976f747 100755
> --- a/tests/qemu-iotests/071
> +++ b/tests/qemu-iotests/071
> @@ -100,6 +100,69 @@ $QEMU_IO -c "open -o driver=$IMGFMT,file.driver=blkdebug,file.inject-error.event
>           -c 'read -P 42 0x38000 512'
>  
>  echo
> +echo "=== Testing blkdebug latency through filename ==="
> +echo
> +
> +$QEMU_IO -c "open -o file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000 $TEST_IMG" \
> +         -c 'aio_write -P 42 0x28000 512' \
> +         -c 'aio_read -P 42 0x38000 512' \
> +         | _filter_qemu_io
> +
> +echo
> +echo "=== Testing blkdebug latency through file blockref ==="
> +echo
> +
> +$QEMU_IO -c "open -o driver=$IMGFMT,file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000,file.image.filename=$TEST_IMG" \
> +         -c 'aio_write -P 42 0x28000 512' \
> +         -c 'aio_read -P 42 0x38000 512' \
> +         | _filter_qemu_io
> +
> +# Using QMP is synchronous by default, so even though we would
> +# expect reordering due to using the aio_* commands, they are
> +# not. The purpose of this test is to verify that the driver
> +# can be setup via QMP, and IO can complete. See the qemu-io
> +# test above to prove delay functionality

But it doesn't prove that because the output is filtered.  To prove it,
you'd probably need to use null-co as the protocol (so there is as
little noise as possible) and then parse the qemu-io output to show that
the time is always above 10 ms.

I leave it to you whether you'd like to go through that pain.

[...]

> diff --git a/tests/qemu-iotests/071.out b/tests/qemu-iotests/071.out
> index 1d5e28d..1952990 100644
> --- a/tests/qemu-iotests/071.out
> +++ b/tests/qemu-iotests/071.out
> @@ -36,6 +36,37 @@ read failed: Input/output error
>  
>  read failed: Input/output error
>  
> +=== Testing blkdebug latency through filename ===
> +
> +read 512/512 bytes at offset 229376
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 163840
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
> +=== Testing blkdebug latency through file blockref ===
> +
> +read 512/512 bytes at offset 229376
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 163840
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

(As you can see, the output is filtered.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type
  2019-01-11 15:00   ` Max Reitz
@ 2019-02-12 21:21     ` Marc Olson
  2019-02-13 15:48       ` Max Reitz
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Olson @ 2019-02-12 21:21 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: jsnow, qemu-block, Kevin Wolf, Eric Blake, Markus Armbruster

On 1/11/19 7:00 AM, Max Reitz wrote:
> On 12.11.18 08:06, Marc Olson wrote:
>> Add a new rule type for blkdebug that instead of returning an error, can
>> inject latency to an IO.
>>
>> Signed-off-by: Marc Olson <marcolso@amazon.com>
>> ---
>>   block/blkdebug.c           | 79 +++++++++++++++++++++++++++++++++++++++++++---
>>   docs/devel/blkdebug.txt    | 35 ++++++++++++++------
>>   qapi/block-core.json       | 31 ++++++++++++++++++
>>   tests/qemu-iotests/071     | 63 ++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/071.out | 31 ++++++++++++++++++
>>   5 files changed, 226 insertions(+), 13 deletions(-)
>>
>> diff --git a/block/blkdebug.c b/block/blkdebug.c
>> index 7739849..6b1f2d6 100644
>> --- a/block/blkdebug.c
>> +++ b/block/blkdebug.c
> [...]
>
>> @@ -194,6 +227,11 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp)
>>               qemu_opt_get_bool(opts, "immediately", 0);
>>           break;
>>   
>> +    case ACTION_INJECT_DELAY:
>> +        rule->options.delay.latency =
>> +            qemu_opt_get_number(opts, "latency", 100) * SCALE_US;
> Why the default of 100?  I think it would be best if this option were
> mandatory.
Ok.
>
>> +        break;
>> +
>>       case ACTION_SET_STATE:
>>           rule->options.set_state.new_state =
>>               qemu_opt_get_number(opts, "new_state", 0);
> [...]
>
>> @@ -484,20 +538,36 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
>>               (bytes && rule->offset >= offset &&
>>                rule->offset < offset + bytes))
>>           {
>> -            if (rule->action == ACTION_INJECT_ERROR) {
>> +            if (!error_rule && rule->action == ACTION_INJECT_ERROR) {
>>                   error_rule = rule;
>> +            } else if (!delay_rule && rule->action == ACTION_INJECT_DELAY) {
>> +                delay_rule = rule;
>> +            }
>> +
> How about handling "once" here?
>
> (by adding:
>
> else {
>      continue;
> }
>
> if (rule->once) {
>      remove_active_rule(s, rule);
> }
>
> and making the QSIMPLEQ_FOREACH a QSIMPLEQ_FOREACH_SAFE.
>
> (Or maybe in that case we want to inline remove_active_rule().))
>
> It isn't like the state here is too bad, but if we can't handle "once"
> in a common code path, I'm torn whether it has a place in the
> BlkdebugRule root.  (Doing that makes parsing a bit easier, but OTOH we
> just shouldn't parse it for set-state at all, so I'd keep it in the
> "unionized structs" if it isn't handled in a common code path.)
Yes, this makes sense.
>
>> +            if (error_rule && delay_rule) {
>>                   break;
>>               }
>>           }
>>       }
>>   
>> +    if (delay_rule) {
>> +        latency = delay_rule->options.delay.latency;
>> +
>> +        if (delay_rule->once) {
>> +            remove_active_rule(s, delay_rule);
>> +        }
>> +
>> +        if (latency != 0) {
>> +            qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, latency);
>> +        }
>> +    }
>> +
>>       if (error_rule) {
>>           immediately = error_rule->options.inject_error.immediately;
>>           error = error_rule->options.inject_error.error;
>>   
>>           if (error_rule->once) {
>> -            QSIMPLEQ_REMOVE(&s->active_rules, error_rule, BlkdebugRule, active_next);
>> -            remove_rule(error_rule);
>> +            remove_active_rule(s, error_rule);
>>           }
>>   
>>           if (error && !immediately) {
> [...]
>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index d4fe710..72f7861 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3057,6 +3057,34 @@
>>               '*immediately': 'bool' } }
>>   
>>   ##
>> +# @BlkdebugDelayOptions:
>> +#
>> +# Describes a single latency injection for blkdebug.
>> +#
>> +# @event:       trigger event
>> +#
>> +# @state:       the state identifier blkdebug needs to be in to
>> +#               actually trigger the event; defaults to "any"
>> +#
>> +# @latency:     The delay to add to an I/O, in microseconds.
>> +#
>> +# @sector:      specifies the sector index which has to be affected
>> +#               in order to actually trigger the event; defaults to "any
>> +#               sector"
>> +#
>> +# @once:        disables further events after this one has been
>> +#               triggered; defaults to false
>> +#
>> +# Since: 3.1
> Well, 4.0 now, sorry...
Baking version numbers into code is downright silly. Every single 
version of this patch has made a comment along the lines of, "oops, it 
didn't get reviewed in time for the next version bump, so you have to 
resubmit." With a fast moving project, this is nonsense. If you're 
looking at the code, you should have access to the git history as well 
to determine the version.
>
>> +##
>> +{ 'struct': 'BlkdebugDelayOptions',
>> +  'data': { 'event': 'BlkdebugEvent',
>> +            '*state': 'int',
>> +            '*latency': 'int',
> I'd make this option mandatory.
>
>> +            '*sector': 'int',
>> +            '*once': 'bool' } }
>> +
>> +##
>>   # @BlkdebugSetStateOptions:
>>   #
>>   # Describes a single state-change event for blkdebug.
>> @@ -3115,6 +3143,8 @@
>>   #
>>   # @inject-error:    array of error injection descriptions
>>   #
>> +# @inject-delay:    array of delay injection descriptions
> This needs a "(Since 4.0)".
>
>> +#
>>   # @set-state:       array of state-change descriptions
>>   #
>>   # Since: 2.9
> [...]
>
>> diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
>> index 48b4955..976f747 100755
>> --- a/tests/qemu-iotests/071
>> +++ b/tests/qemu-iotests/071
>> @@ -100,6 +100,69 @@ $QEMU_IO -c "open -o driver=$IMGFMT,file.driver=blkdebug,file.inject-error.event
>>            -c 'read -P 42 0x38000 512'
>>   
>>   echo
>> +echo "=== Testing blkdebug latency through filename ==="
>> +echo
>> +
>> +$QEMU_IO -c "open -o file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000 $TEST_IMG" \
>> +         -c 'aio_write -P 42 0x28000 512' \
>> +         -c 'aio_read -P 42 0x38000 512' \
>> +         | _filter_qemu_io
>> +
>> +echo
>> +echo "=== Testing blkdebug latency through file blockref ==="
>> +echo
>> +
>> +$QEMU_IO -c "open -o driver=$IMGFMT,file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000,file.image.filename=$TEST_IMG" \
>> +         -c 'aio_write -P 42 0x28000 512' \
>> +         -c 'aio_read -P 42 0x38000 512' \
>> +         | _filter_qemu_io
>> +
>> +# Using QMP is synchronous by default, so even though we would
>> +# expect reordering due to using the aio_* commands, they are
>> +# not. The purpose of this test is to verify that the driver
>> +# can be setup via QMP, and IO can complete. See the qemu-io
>> +# test above to prove delay functionality
> But it doesn't prove that because the output is filtered.  To prove it,
> you'd probably need to use null-co as the protocol (so there is as
> little noise as possible) and then parse the qemu-io output to show that
> the time is always above 10 ms.
>
> I leave it to you whether you'd like to go through that pain.
There's not a great way to prove it without doing a lot of parsing 
changes in testing. I'll consider an update to this patch, but I think 
this series has carried on long enough.
>
> [...]
>
>> diff --git a/tests/qemu-iotests/071.out b/tests/qemu-iotests/071.out
>> index 1d5e28d..1952990 100644
>> --- a/tests/qemu-iotests/071.out
>> +++ b/tests/qemu-iotests/071.out
>> @@ -36,6 +36,37 @@ read failed: Input/output error
>>   
>>   read failed: Input/output error
>>   
>> +=== Testing blkdebug latency through filename ===
>> +
>> +read 512/512 bytes at offset 229376
>> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> +wrote 512/512 bytes at offset 163840
>> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> +
>> +=== Testing blkdebug latency through file blockref ===
>> +
>> +read 512/512 bytes at offset 229376
>> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> +wrote 512/512 bytes at offset 163840
>> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> (As you can see, the output is filtered.)
>
> Max
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type
  2019-02-12 21:21     ` Marc Olson
@ 2019-02-13 15:48       ` Max Reitz
  2019-02-13 20:49         ` Marc Olson
  0 siblings, 1 reply; 17+ messages in thread
From: Max Reitz @ 2019-02-13 15:48 UTC (permalink / raw)
  To: Marc Olson, qemu-devel
  Cc: jsnow, qemu-block, Kevin Wolf, Eric Blake, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 5522 bytes --]

On 12.02.19 22:21, Marc Olson wrote:
> On 1/11/19 7:00 AM, Max Reitz wrote:
>> On 12.11.18 08:06, Marc Olson wrote:

[...]

>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index d4fe710..72f7861 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -3057,6 +3057,34 @@
>>>               '*immediately': 'bool' } }
>>>     ##
>>> +# @BlkdebugDelayOptions:
>>> +#
>>> +# Describes a single latency injection for blkdebug.
>>> +#
>>> +# @event:       trigger event
>>> +#
>>> +# @state:       the state identifier blkdebug needs to be in to
>>> +#               actually trigger the event; defaults to "any"
>>> +#
>>> +# @latency:     The delay to add to an I/O, in microseconds.
>>> +#
>>> +# @sector:      specifies the sector index which has to be affected
>>> +#               in order to actually trigger the event; defaults to
>>> "any
>>> +#               sector"
>>> +#
>>> +# @once:        disables further events after this one has been
>>> +#               triggered; defaults to false
>>> +#
>>> +# Since: 3.1
>> Well, 4.0 now, sorry...
> Baking version numbers into code is downright silly. Every single
> version of this patch has made a comment along the lines of, "oops, it
> didn't get reviewed in time for the next version bump, so you have to
> resubmit." With a fast moving project, this is nonsense. If you're
> looking at the code, you should have access to the git history as well
> to determine the version.

True, but these comments are used to generate documentation (e.g.
docs/interopt/qemu-qmp-ref.7 in the build directory).  So they are used
by people who don't have access to the git history.

It might be possible to generate that information from git-blame when
generating the documentation, but how would trivial fixes be handled
that are no functional changes?  For instance, it seems difficult to me
to distinguish between a spelling change for some parameter description
and a change in behavior.

(Then again, we shouldn't have such behavioral changes, hm.)

To me personally, the easiest thing would seem to be some convention to
write ***INSERT VERSION HERE*** into the code and then the maintainer
can just find an replace all instances of that when applying the
patches.  But that sounds a bit silly...

(I don't have an issue with adjusting the version numbers myself as they
are, but it's just as hard for me to find and replace all of them as it
is for you.  And since you're probably going to send a v4 anyway...)

In the end, it's up to the QAPI maintainers.

[...]

>>> diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
>>> index 48b4955..976f747 100755
>>> --- a/tests/qemu-iotests/071
>>> +++ b/tests/qemu-iotests/071
>>> @@ -100,6 +100,69 @@ $QEMU_IO -c "open -o
>>> driver=$IMGFMT,file.driver=blkdebug,file.inject-error.event
>>>            -c 'read -P 42 0x38000 512'
>>>     echo
>>> +echo "=== Testing blkdebug latency through filename ==="
>>> +echo
>>> +
>>> +$QEMU_IO -c "open -o
>>> file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000
>>> $TEST_IMG" \
>>> +         -c 'aio_write -P 42 0x28000 512' \
>>> +         -c 'aio_read -P 42 0x38000 512' \
>>> +         | _filter_qemu_io
>>> +
>>> +echo
>>> +echo "=== Testing blkdebug latency through file blockref ==="
>>> +echo
>>> +
>>> +$QEMU_IO -c "open -o
>>> driver=$IMGFMT,file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000,file.image.filename=$TEST_IMG"
>>> \
>>> +         -c 'aio_write -P 42 0x28000 512' \
>>> +         -c 'aio_read -P 42 0x38000 512' \
>>> +         | _filter_qemu_io
>>> +
>>> +# Using QMP is synchronous by default, so even though we would
>>> +# expect reordering due to using the aio_* commands, they are
>>> +# not. The purpose of this test is to verify that the driver
>>> +# can be setup via QMP, and IO can complete. See the qemu-io
>>> +# test above to prove delay functionality
>> But it doesn't prove that because the output is filtered.  To prove it,
>> you'd probably need to use null-co as the protocol (so there is as
>> little noise as possible) and then parse the qemu-io output to show that
>> the time is always above 10 ms.
>>
>> I leave it to you whether you'd like to go through that pain.
> There's not a great way to prove it without doing a lot of parsing
> changes in testing. I'll consider an update to this patch, but I think
> this series has carried on long enough.

I agree in this instance, but I'd like to note still that "this series
has carried on long enough" is not an argument to merge bad code (or
something incomplete without promise of a follow-up).  This is just a
test for blkdebug, though, so it's OK (with the comment fixed, because
it doesn't prove anything).

(I'm sorry for not having looked at this series for so long, but qemu is
not my own, so it isn't like I could pay for my wrong by accepting
something incomplete -- if it were more important than this single test
case.)

Also, we do support Python for iotests, and parsing should be simpler
there.  Since blkdebug is just a debugging driver, I'm fine with not
knowing whether its features work, though.

Maybe I'll write the test, that would kind of pay for my wrong...

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type
  2019-02-13 15:48       ` Max Reitz
@ 2019-02-13 20:49         ` Marc Olson
  2019-02-13 21:12           ` Max Reitz
  2019-02-14  6:39           ` Markus Armbruster
  0 siblings, 2 replies; 17+ messages in thread
From: Marc Olson @ 2019-02-13 20:49 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: jsnow, qemu-block, Kevin Wolf, Eric Blake, Markus Armbruster

On 2/13/19 7:48 AM, Max Reitz wrote:
> On 12.02.19 22:21, Marc Olson wrote:
>> On 1/11/19 7:00 AM, Max Reitz wrote:
>>> On 12.11.18 08:06, Marc Olson wrote:
> [...]
>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index d4fe710..72f7861 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -3057,6 +3057,34 @@
>>>>                '*immediately': 'bool' } }
>>>>      ##
>>>> +# @BlkdebugDelayOptions:
>>>> +#
>>>> +# Describes a single latency injection for blkdebug.
>>>> +#
>>>> +# @event:       trigger event
>>>> +#
>>>> +# @state:       the state identifier blkdebug needs to be in to
>>>> +#               actually trigger the event; defaults to "any"
>>>> +#
>>>> +# @latency:     The delay to add to an I/O, in microseconds.
>>>> +#
>>>> +# @sector:      specifies the sector index which has to be affected
>>>> +#               in order to actually trigger the event; defaults to
>>>> "any
>>>> +#               sector"
>>>> +#
>>>> +# @once:        disables further events after this one has been
>>>> +#               triggered; defaults to false
>>>> +#
>>>> +# Since: 3.1
>>> Well, 4.0 now, sorry...
>> Baking version numbers into code is downright silly. Every single
>> version of this patch has made a comment along the lines of, "oops, it
>> didn't get reviewed in time for the next version bump, so you have to
>> resubmit." With a fast moving project, this is nonsense. If you're
>> looking at the code, you should have access to the git history as well
>> to determine the version.
> True, but these comments are used to generate documentation (e.g.
> docs/interopt/qemu-qmp-ref.7 in the build directory).  So they are used
> by people who don't have access to the git history.
>
> It might be possible to generate that information from git-blame when
> generating the documentation, but how would trivial fixes be handled
> that are no functional changes?  For instance, it seems difficult to me
> to distinguish between a spelling change for some parameter description
> and a change in behavior.
>
> (Then again, we shouldn't have such behavioral changes, hm.)
>
> To me personally, the easiest thing would seem to be some convention to
> write ***INSERT VERSION HERE*** into the code and then the maintainer
> can just find an replace all instances of that when applying the
> patches.  But that sounds a bit silly...
>
> (I don't have an issue with adjusting the version numbers myself as they
> are, but it's just as hard for me to find and replace all of them as it
> is for you.  And since you're probably going to send a v4 anyway...)
>
> In the end, it's up to the QAPI maintainers.

IFF I submit by the end of the week, what version number shall I choose?


> [...]
>
>>>> diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
>>>> index 48b4955..976f747 100755
>>>> --- a/tests/qemu-iotests/071
>>>> +++ b/tests/qemu-iotests/071
>>>> @@ -100,6 +100,69 @@ $QEMU_IO -c "open -o
>>>> driver=$IMGFMT,file.driver=blkdebug,file.inject-error.event
>>>>             -c 'read -P 42 0x38000 512'
>>>>      echo
>>>> +echo "=== Testing blkdebug latency through filename ==="
>>>> +echo
>>>> +
>>>> +$QEMU_IO -c "open -o
>>>> file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000
>>>> $TEST_IMG" \
>>>> +         -c 'aio_write -P 42 0x28000 512' \
>>>> +         -c 'aio_read -P 42 0x38000 512' \
>>>> +         | _filter_qemu_io
>>>> +
>>>> +echo
>>>> +echo "=== Testing blkdebug latency through file blockref ==="
>>>> +echo
>>>> +
>>>> +$QEMU_IO -c "open -o
>>>> driver=$IMGFMT,file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000,file.image.filename=$TEST_IMG"
>>>> \
>>>> +         -c 'aio_write -P 42 0x28000 512' \
>>>> +         -c 'aio_read -P 42 0x38000 512' \
>>>> +         | _filter_qemu_io
>>>> +
>>>> +# Using QMP is synchronous by default, so even though we would
>>>> +# expect reordering due to using the aio_* commands, they are
>>>> +# not. The purpose of this test is to verify that the driver
>>>> +# can be setup via QMP, and IO can complete. See the qemu-io
>>>> +# test above to prove delay functionality
>>> But it doesn't prove that because the output is filtered.  To prove it,
>>> you'd probably need to use null-co as the protocol (so there is as
>>> little noise as possible) and then parse the qemu-io output to show that
>>> the time is always above 10 ms.
>>>
>>> I leave it to you whether you'd like to go through that pain.
>> There's not a great way to prove it without doing a lot of parsing
>> changes in testing. I'll consider an update to this patch, but I think
>> this series has carried on long enough.
> I agree in this instance, but I'd like to note still that "this series
> has carried on long enough" is not an argument to merge bad code (or
> something incomplete without promise of a follow-up).  This is just a
> test for blkdebug, though, so it's OK (with the comment fixed, because
> it doesn't prove anything).
I wasn't implying that it's ok to merge bad code, but that at some point 
we have to just paint the shed a color and move on. At the risk of 
excess drama, at what point does a small addition become a complete 
tear-down and rebuild?
> (I'm sorry for not having looked at this series for so long, but qemu is
> not my own, so it isn't like I could pay for my wrong by accepting
> something incomplete -- if it were more important than this single test
> case.)
>
> Also, we do support Python for iotests, and parsing should be simpler
> there.  Since blkdebug is just a debugging driver, I'm fine with not
> knowing whether its features work, though.
>
> Maybe I'll write the test, that would kind of pay for my wrong...

I think the real fix is to make QMP support async IO, but if you'd like 
to do more parsing, that's fine too.

/marc

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type
  2019-02-13 20:49         ` Marc Olson
@ 2019-02-13 21:12           ` Max Reitz
  2019-02-14  6:39           ` Markus Armbruster
  1 sibling, 0 replies; 17+ messages in thread
From: Max Reitz @ 2019-02-13 21:12 UTC (permalink / raw)
  To: Marc Olson, qemu-devel
  Cc: jsnow, qemu-block, Kevin Wolf, Eric Blake, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 6537 bytes --]

On 13.02.19 21:49, Marc Olson wrote:
> On 2/13/19 7:48 AM, Max Reitz wrote:
>> On 12.02.19 22:21, Marc Olson wrote:
>>> On 1/11/19 7:00 AM, Max Reitz wrote:
>>>> On 12.11.18 08:06, Marc Olson wrote:
>> [...]
>>
>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>> index d4fe710..72f7861 100644
>>>>> --- a/qapi/block-core.json
>>>>> +++ b/qapi/block-core.json
>>>>> @@ -3057,6 +3057,34 @@
>>>>>                '*immediately': 'bool' } }
>>>>>      ##
>>>>> +# @BlkdebugDelayOptions:
>>>>> +#
>>>>> +# Describes a single latency injection for blkdebug.
>>>>> +#
>>>>> +# @event:       trigger event
>>>>> +#
>>>>> +# @state:       the state identifier blkdebug needs to be in to
>>>>> +#               actually trigger the event; defaults to "any"
>>>>> +#
>>>>> +# @latency:     The delay to add to an I/O, in microseconds.
>>>>> +#
>>>>> +# @sector:      specifies the sector index which has to be affected
>>>>> +#               in order to actually trigger the event; defaults to
>>>>> "any
>>>>> +#               sector"
>>>>> +#
>>>>> +# @once:        disables further events after this one has been
>>>>> +#               triggered; defaults to false
>>>>> +#
>>>>> +# Since: 3.1
>>>> Well, 4.0 now, sorry...
>>> Baking version numbers into code is downright silly. Every single
>>> version of this patch has made a comment along the lines of, "oops, it
>>> didn't get reviewed in time for the next version bump, so you have to
>>> resubmit." With a fast moving project, this is nonsense. If you're
>>> looking at the code, you should have access to the git history as well
>>> to determine the version.
>> True, but these comments are used to generate documentation (e.g.
>> docs/interopt/qemu-qmp-ref.7 in the build directory).  So they are used
>> by people who don't have access to the git history.
>>
>> It might be possible to generate that information from git-blame when
>> generating the documentation, but how would trivial fixes be handled
>> that are no functional changes?  For instance, it seems difficult to me
>> to distinguish between a spelling change for some parameter description
>> and a change in behavior.
>>
>> (Then again, we shouldn't have such behavioral changes, hm.)
>>
>> To me personally, the easiest thing would seem to be some convention to
>> write ***INSERT VERSION HERE*** into the code and then the maintainer
>> can just find an replace all instances of that when applying the
>> patches.  But that sounds a bit silly...
>>
>> (I don't have an issue with adjusting the version numbers myself as they
>> are, but it's just as hard for me to find and replace all of them as it
>> is for you.  And since you're probably going to send a v4 anyway...)
>>
>> In the end, it's up to the QAPI maintainers.
> 
> IFF I submit by the end of the week, what version number shall I choose?

4.0 is still the next one.

>> [...]
>>
>>>>> diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
>>>>> index 48b4955..976f747 100755
>>>>> --- a/tests/qemu-iotests/071
>>>>> +++ b/tests/qemu-iotests/071
>>>>> @@ -100,6 +100,69 @@ $QEMU_IO -c "open -o
>>>>> driver=$IMGFMT,file.driver=blkdebug,file.inject-error.event
>>>>>             -c 'read -P 42 0x38000 512'
>>>>>      echo
>>>>> +echo "=== Testing blkdebug latency through filename ==="
>>>>> +echo
>>>>> +
>>>>> +$QEMU_IO -c "open -o
>>>>> file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000
>>>>>
>>>>> $TEST_IMG" \
>>>>> +         -c 'aio_write -P 42 0x28000 512' \
>>>>> +         -c 'aio_read -P 42 0x38000 512' \
>>>>> +         | _filter_qemu_io
>>>>> +
>>>>> +echo
>>>>> +echo "=== Testing blkdebug latency through file blockref ==="
>>>>> +echo
>>>>> +
>>>>> +$QEMU_IO -c "open -o
>>>>> driver=$IMGFMT,file.driver=blkdebug,file.inject-delay.event=write_aio,file.inject-delay.latency=10000,file.image.filename=$TEST_IMG"
>>>>>
>>>>> \
>>>>> +         -c 'aio_write -P 42 0x28000 512' \
>>>>> +         -c 'aio_read -P 42 0x38000 512' \
>>>>> +         | _filter_qemu_io
>>>>> +
>>>>> +# Using QMP is synchronous by default, so even though we would
>>>>> +# expect reordering due to using the aio_* commands, they are
>>>>> +# not. The purpose of this test is to verify that the driver
>>>>> +# can be setup via QMP, and IO can complete. See the qemu-io
>>>>> +# test above to prove delay functionality
>>>> But it doesn't prove that because the output is filtered.  To prove it,
>>>> you'd probably need to use null-co as the protocol (so there is as
>>>> little noise as possible) and then parse the qemu-io output to show
>>>> that
>>>> the time is always above 10 ms.
>>>>
>>>> I leave it to you whether you'd like to go through that pain.
>>> There's not a great way to prove it without doing a lot of parsing
>>> changes in testing. I'll consider an update to this patch, but I think
>>> this series has carried on long enough.
>> I agree in this instance, but I'd like to note still that "this series
>> has carried on long enough" is not an argument to merge bad code (or
>> something incomplete without promise of a follow-up).  This is just a
>> test for blkdebug, though, so it's OK (with the comment fixed, because
>> it doesn't prove anything).
> I wasn't implying that it's ok to merge bad code, but that at some point
> we have to just paint the shed a color and move on. At the risk of
> excess drama, at what point does a small addition become a complete
> tear-down and rebuild?

Depends on the case whether a tear down is necessary. :-)

>> (I'm sorry for not having looked at this series for so long, but qemu is
>> not my own, so it isn't like I could pay for my wrong by accepting
>> something incomplete -- if it were more important than this single test
>> case.)
>>
>> Also, we do support Python for iotests, and parsing should be simpler
>> there.  Since blkdebug is just a debugging driver, I'm fine with not
>> knowing whether its features work, though.
>>
>> Maybe I'll write the test, that would kind of pay for my wrong...
> 
> I think the real fix is to make QMP support async IO, but if you'd like
> to do more parsing, that's fine too.

Hm, yeah, that would work, too.  But that's definitely more complicated
than a bit of parsing...

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type
  2019-02-13 20:49         ` Marc Olson
  2019-02-13 21:12           ` Max Reitz
@ 2019-02-14  6:39           ` Markus Armbruster
  1 sibling, 0 replies; 17+ messages in thread
From: Markus Armbruster @ 2019-02-14  6:39 UTC (permalink / raw)
  To: Marc Olson via Qemu-devel
  Cc: Max Reitz, Marc Olson, Kevin Wolf, jsnow, qemu-block

Marc Olson via Qemu-devel <qemu-devel@nongnu.org> writes:

> On 2/13/19 7:48 AM, Max Reitz wrote:
>> On 12.02.19 22:21, Marc Olson wrote:
>>> On 1/11/19 7:00 AM, Max Reitz wrote:
>>>> On 12.11.18 08:06, Marc Olson wrote:
>> [...]
>>
>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>> index d4fe710..72f7861 100644
>>>>> --- a/qapi/block-core.json
>>>>> +++ b/qapi/block-core.json
>>>>> @@ -3057,6 +3057,34 @@
>>>>>                '*immediately': 'bool' } }
>>>>>      ##
>>>>> +# @BlkdebugDelayOptions:
>>>>> +#
>>>>> +# Describes a single latency injection for blkdebug.
>>>>> +#
>>>>> +# @event:       trigger event
>>>>> +#
>>>>> +# @state:       the state identifier blkdebug needs to be in to
>>>>> +#               actually trigger the event; defaults to "any"
>>>>> +#
>>>>> +# @latency:     The delay to add to an I/O, in microseconds.
>>>>> +#
>>>>> +# @sector:      specifies the sector index which has to be affected
>>>>> +#               in order to actually trigger the event; defaults to
>>>>> "any
>>>>> +#               sector"
>>>>> +#
>>>>> +# @once:        disables further events after this one has been
>>>>> +#               triggered; defaults to false
>>>>> +#
>>>>> +# Since: 3.1
>>>> Well, 4.0 now, sorry...
>>> Baking version numbers into code is downright silly. Every single
>>> version of this patch has made a comment along the lines of, "oops, it
>>> didn't get reviewed in time for the next version bump, so you have to
>>> resubmit." With a fast moving project, this is nonsense. If you're
>>> looking at the code, you should have access to the git history as well
>>> to determine the version.
>> True, but these comments are used to generate documentation (e.g.
>> docs/interopt/qemu-qmp-ref.7 in the build directory).  So they are used
>> by people who don't have access to the git history.

Exactly.

>> It might be possible to generate that information from git-blame when
>> generating the documentation, but how would trivial fixes be handled
>> that are no functional changes?  For instance, it seems difficult to me
>> to distinguish between a spelling change for some parameter description
>> and a change in behavior.
>>
>> (Then again, we shouldn't have such behavioral changes, hm.)

Gathering "since" information could be partially automated.  Syntactical
changes are visible in introspection.  Keep a persistent map syntactical
element -> first version, and a program to update it for new stuff in
introspection data since the last release.  Commit the map on release.

Automation would be partial, because we can't rule out the need for
documenting a "since" for a change that isn't syntactical.

While there, have the program report any changes that aren't additions.
These are potential compatibility breakers, and should be reviewed.

>> To me personally, the easiest thing would seem to be some convention to
>> write ***INSERT VERSION HERE*** into the code and then the maintainer
>> can just find an replace all instances of that when applying the
>> patches.  But that sounds a bit silly...

Maintainers are our bottleneck.  Shifting work from submitter to
maintainer makes the bottleneck narrower.

>> (I don't have an issue with adjusting the version numbers myself as they
>> are, but it's just as hard for me to find and replace all of them as it
>> is for you.  And since you're probably going to send a v4 anyway...)
>>
>> In the end, it's up to the QAPI maintainers.
>
> IFF I submit by the end of the week, what version number shall I choose?

IFF you want to be taken seriously, cut the hyperbole.

You raised a valid point.  Don't sabotage yourself.

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-02-14  6:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-12  7:06 [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing Marc Olson
2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 2/3] blkdebug: Extend rule check for additional types Marc Olson
2018-11-13 23:22   ` John Snow
2018-11-13 23:34     ` Marc Olson
2018-11-13 23:38       ` John Snow
2019-01-11 14:41   ` Max Reitz
2018-11-12  7:06 ` [Qemu-devel] [PATCH v3 3/3] blkdebug: Add latency injection rule type Marc Olson
2018-11-13 23:57   ` John Snow
2019-01-11 15:00   ` Max Reitz
2019-02-12 21:21     ` Marc Olson
2019-02-13 15:48       ` Max Reitz
2019-02-13 20:49         ` Marc Olson
2019-02-13 21:12           ` Max Reitz
2019-02-14  6:39           ` Markus Armbruster
2018-11-12 11:15 ` [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing Dongli Zhang
2018-11-13 23:00 ` John Snow
2019-01-11 14:36 ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.