From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40234) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XMSWp-0007T5-K6 for qemu-devel@nongnu.org; Tue, 26 Aug 2014 22:00:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XMSWl-0008N1-B3 for qemu-devel@nongnu.org; Tue, 26 Aug 2014 22:00:03 -0400 Received: from mail-pd0-x22a.google.com ([2607:f8b0:400e:c02::22a]:37941) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XMSWk-0008Ms-VT for qemu-devel@nongnu.org; Tue, 26 Aug 2014 21:59:59 -0400 Received: by mail-pd0-f170.google.com with SMTP id g10so23912240pdj.1 for ; Tue, 26 Aug 2014 18:59:57 -0700 (PDT) Sender: Hitoshi Mitake From: Hitoshi Mitake Date: Wed, 27 Aug 2014 10:59:40 +0900 Message-Id: <1409104780-31445-1-git-send-email-mitake.hitoshi@lab.ntt.co.jp> Subject: [Qemu-devel] [PATCH v2] blkdebug: make the fault injection functionality callable from QMP List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Hitoshi Mitake , Kevin Wolf , Stefan Hajnoczi , mitake.hitoshi@gmail.com This patch makes the fault injection functionality of blkdebug callable from QMP. Motivation of this change is for testing and debugging distributed systems. Ordinal distributed systems must handle hardware faults because of its reason for existence, but testing whether the systems can hanle such faults and recover in a correct manner is really hard. Typically, developers of distributed systems check such recovery paths with unit test or artificial environment which can be built in a single box. But such tests can miss important attributes of real world hardware faults. Examples of disk drive: - write(2) doesn't return -1 immediately in a case of disk error even a target file is opened with O_SYNC, if file system of the file is not mounted with barrier option - some disks become silent suddenly without causing errors, so applications must handle such a case with fine tuned timeout of disk I/O - some disks can cause performance degradation instead of stopping and causing errors [1] For testing recovery paths and configuration of distributed systems, mocking faults like the above examples in virtual devices is effective. Because ordinal testing techniques which target errors of library APIs and systemcalls cannot mock the above faults. In addition, injecting faults at the level of virtual devices can test whole stack of target systems (from device drivers to applications). As a first step of implementing this testing technique, this patch implements a new QMP command which updates error injection rules of blkdebug. I think it is more useful for testing distributed systems than existing config file based fault injection of blkdebug. Because users can inject faults at any time. With this feature, I could find a potential problem in the deployment guide of OpenStack Swift [2]. In the guide, nobarrier option of xfs is suggested without any caution. The option degrades durability of Swift cluster because it delays detection of disk error. In addition, the option is not suggested in a book of Swift guide [3]. So I concluded the guide [2] can lead to a misconfiguration of Swift. I believe this sort of problem can be found in other systems so the feature is useful for developers and admins of distributed systems. Example of launching QEMU with this feature: sudo x86_64-softmmu/qemu-system-x86_64 -qmp \ tcp:localhost:4444,server,nowait -enable-kvm -hda \ blkdebug:/dev/null:/tmp/debian.qcow2 (/dev/null is needed because blkdebug requires configuration file, but for QMP purpose empty file is enough) Example of QMP sequence (via telnet localhost 4444): { "execute": "qmp_capabilities" } {"return": {}} {"execute": "blkdebug-set-rules", "arguments": {"device": "ide0-hd0", "rules":[{"event": "write_aio", "type": "inject-error", "immediately": 1, "once": 0, "state": 1}]}} # <- inject error to /dev/sda {"return": {}} Now the guest OS on the VM finds the disk is broken. Of course, using QMP directly is painful for users (developers and admins of distributed systems). I'm implementing user friendly interface in vagrant-kvm [4] for blackbox testing. In addition, a testing framework for injecting faults at critical timing (which requires solid understanding of target systems) is in progress. [1] http://ucare.cs.uchicago.edu/pdf/socc13-limplock.pdf [2] http://docs.openstack.org/developer/swift/howto_installmultinode.html [3] http://www.amazon.com/dp/B00C93QFHI [4] https://github.com/adrahon/vagrant-kvm Cc: Eric Blake Cc: Kevin Wolf Cc: Stefan Hajnoczi Signed-off-by: Hitoshi Mitake --- block/blkdebug.c | 199 ++++++++++++++++++++++++++++++++++++++++++++++++++ include/block/block.h | 2 + qapi-schema.json | 14 ++++ qmp-commands.hx | 18 +++++ 4 files changed, 233 insertions(+) v2: - don't prepare a new mechanism for fault injection -- implement the feature with updating fault rules of blkdebug - add an example of QMP command diff --git a/block/blkdebug.c b/block/blkdebug.c index f51407d..2b9d616 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -687,6 +687,205 @@ static int64_t blkdebug_getlength(BlockDriverState *bs) return bdrv_getlength(bs->file); } +struct qmp_rules_list_iter { + bool failed; + QemuOpts *set_state, *inject_error; + + Error *err; +}; + +static void rules_list_iter(QObject *obj, void *opaque) +{ + struct qmp_rules_list_iter *iter = (struct qmp_rules_list_iter *)opaque; + QemuOpts *new_opts; + QDict *dict; + Error *err; + const char *type; + + const char *event_name; + int state; + + if (iter->failed) { + /* do nothing anymore */ + return; + } + + dict = qobject_to_qdict(obj); + if (!dict) { + error_set(&iter->err, QERR_INVALID_PARAMETER_TYPE, + "member of rules", "dict"); + goto fail; + } + + event_name = qdict_get_str(dict, "event"); + if (!event_name) { + error_set(&iter->err, QERR_MISSING_PARAMETER, "event"); + goto fail; + } + + state = qdict_get_try_int(dict, "state", 0); + + type = qdict_get_str(dict, "type"); + if (!strcmp(type, "set-state")) { + int new_state; + + if (iter->set_state) { + error_setg(&iter->err, "duplicate entry for set-state"); + goto fail; + } + + new_opts = qemu_opts_create(&set_state_opts, NULL, 0, &err); + if (!new_opts) { + iter->err = err; + goto fail; + } + + iter->set_state = new_opts; + + new_state = qdict_get_try_int(dict, "new_state", 0); + if (qemu_opt_set_number(new_opts, "new_state", new_state) < 0) { + error_setg(&iter->err, "faild to set new_state"); + goto fail; + } + } else if (!strcmp(type, "inject-error")) { + int _errno, sector; + bool once, immediately; + + if (iter->inject_error) { + error_setg(&iter->err, "duplicate entry for inject-error"); + goto fail; + } + + new_opts = qemu_opts_create(&inject_error_opts, NULL, 0, &err); + if (!new_opts) { + iter->err = err; + goto fail; + } + + iter->inject_error = new_opts; + + _errno = qdict_get_try_int(dict, "errno", EIO); + if (qemu_opt_set_number(new_opts, "errno", _errno) < 0) { + error_setg(&iter->err, "faild to set errno"); + goto fail; + } + + sector = qdict_get_try_int(dict, "sector", -1); + if (qemu_opt_set_number(new_opts, "sector", sector) < 0) { + error_setg(&iter->err, "faild to set sector"); + goto fail; + } + + once = qdict_get_try_bool(dict, "once", 0); + if (qemu_opt_set_bool(new_opts, "once", once) < 0) { + error_setg(&iter->err, "faild to set once"); + goto fail; + } + + immediately = qdict_get_try_bool(dict, "immediately", 0); + if (qemu_opt_set_bool(new_opts, "immediately", immediately) < 0) { + error_setg(&iter->err, "faild to set immediately"); + goto fail; + } + } else { + error_setg(&iter->err, "unknown type of rule: %s", type); + goto fail; + } + + if (qemu_opt_set_number(new_opts, "state", state) < 0) { + error_setg(&iter->err, "faild to set state"); + goto fail; + } + + if (qemu_opt_set(new_opts, "event", event_name) < 0) { + error_setg(&iter->err, "faild to set event"); + goto fail; + } + + return; + +fail: + iter->failed = true; +} + +int qmp_blkdebug_set_rules(Monitor *mon, const QDict *qdict, QObject **ret) +{ + const char *device = qdict_get_str(qdict, "device"); + QObject *rules = qdict_get(qdict, "rules"); + const QList *rules_list = NULL; + Error *local_err = NULL; + BlockDriverState *bs; + BDRVBlkdebugState *s; + struct qmp_rules_list_iter iter; + struct add_rule_data d; + + if (!device) { + error_set(&local_err, QERR_MISSING_PARAMETER, "device"); + goto out; + } + + bs = bdrv_find(device); + if (!bs) { + error_set(&local_err, QERR_DEVICE_NOT_FOUND, device); + goto out; + } + + bs = bs->file; + if (strcmp(bs->drv->format_name, "blkdebug")) { + error_setg(&local_err, "BlockDriver (%s) isn't blkdebug", + bs->drv->format_name); + goto out; + } + s = bs->opaque; + + if (!rules) { + error_set(&local_err, QERR_MISSING_PARAMETER, "rules"); + goto out; + } + + rules_list = qobject_to_qlist(rules); + if (!rules_list) { + error_set(&local_err, QERR_INVALID_PARAMETER_TYPE, "rules", "list"); + goto out; + } + + memset(&iter, 0, sizeof(iter)); + qlist_iter(rules_list, rules_list_iter, &iter); + if (iter.failed) { + local_err = iter.err; + goto out; + } + + d.s = s; + s->state = 1; + if (iter.inject_error) { + d.action = ACTION_INJECT_ERROR; + add_rule(iter.inject_error, &d); + } + + if (iter.set_state) { + d.action = ACTION_SET_STATE; + add_rule(iter.set_state, &d); + } + +out: + if (iter.inject_error) { + qemu_opts_del(iter.inject_error); + } + + if (iter.set_state) { + qemu_opts_del(iter.set_state); + } + + if (local_err) { + qerror_report_err(local_err); + error_free(local_err); + return -1; + } + + return 0; +} + static BlockDriver bdrv_blkdebug = { .format_name = "blkdebug", .protocol_name = "blkdebug", diff --git a/include/block/block.h b/include/block/block.h index f08471d..421a1b5 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -588,4 +588,6 @@ void bdrv_io_plug(BlockDriverState *bs); void bdrv_io_unplug(BlockDriverState *bs); void bdrv_flush_io_queue(BlockDriverState *bs); +int qmp_blkdebug_set_rules(Monitor *mon, const QDict *qdict, QObject **ret); + #endif diff --git a/qapi-schema.json b/qapi-schema.json index 341f417..13bab1d 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -3481,3 +3481,17 @@ # Since: 2.1 ## { 'command': 'rtc-reset-reinjection' } + +## +# @blockdebug-set-rules +# +# Set rules of blkdebug for the given block device. +# +# @device: device ID of target block device +# @rules: rules for setting, list of dictionary +# +# Since: 2.2 +## +{ 'command': 'blkdebug-set-rules', + 'data': { 'device': 'str', 'rules': [ 'dict' ] }, + 'gen': 'no'} diff --git a/qmp-commands.hx b/qmp-commands.hx index 4be4765..ef42cf0 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -3755,3 +3755,21 @@ Example: <- { "return": {} } EQMP + { + .name = "blkdebug-set-rules", + .args_type = "device:s,rules:q", + .mhandler.cmd_new = qmp_blkdebug_set_rules, + }, +SQMP +blkdebug-set-rules +------------------ + +Set blockdebug rules + +Example: +-> {"execute": "blkdebug-set-rules", "arguments": {"device": + "ide0-hd0", "rules":[{"event": "write_aio", "type": "inject-error", + "immediately": 1, "once": 0, "state": 1}]}} +<- { "return": {} } + +EQMP -- 1.8.3.2