qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] block-backend: Introduce I/O hang
@ 2020-10-22 13:02 Jiahui Cen
  2020-10-22 13:02 ` [PATCH v3 1/9] block-backend: introduce I/O rehandle info Jiahui Cen
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Jiahui Cen @ 2020-10-22 13:02 UTC (permalink / raw)
  To: qemu-devel, kwolf, mreitz, eblake
  Cc: cenjiahui, zhang.zhanghailiang, qemu-block, stefanha, fangying1, jsnow

A VM in the cloud environment may use a virutal disk as the backend storage,
and there are usually filesystems on the virtual block device. When backend
storage is temporarily down, any I/O issued to the virtual block device will
cause an error. For example, an error occurred in ext4 filesystem would make
the filesystem readonly. However a cloud backend storage can be soon recovered.
For example, an IP-SAN may be down due to network failure and will be online
soon after network is recovered. The error in the filesystem may not be
recovered unless a device reattach or system restart. So an I/O rehandle is
in need to implement a self-healing mechanism.

This patch series propose a feature called I/O hang. It can rehandle AIOs
with EIO error without sending error back to guest. From guest's perspective
of view it is just like an IO is hanging and not returned. Guest can get
back running smoothly when I/O is recovred with this feature enabled.

v2->v3:
* Add a doc to describe I/O hang.

v1->v2:
* Rebase to fix compile problems.
* Fix incorrect remove of rehandle list.
* Provide rehandle pause interface.

Jiahui Cen (9):
  block-backend: introduce I/O rehandle info
  block-backend: rehandle block aios when EIO
  block-backend: add I/O hang timeout
  block-backend: add I/O rehandle pause/unpause
  block-backend: enable I/O hang when timeout is set
  virtio-blk: pause I/O hang when resetting
  qemu-option: add I/O hang timeout option
  qapi: add I/O hang and I/O hang timeout qapi event
  docs: add a doc about I/O hang

 block/block-backend.c          | 300 +++++++++++++++++++++++++++++++++
 blockdev.c                     |  11 ++
 docs/io-hang.rst               |  45 +++++
 hw/block/virtio-blk.c          |   8 +
 include/sysemu/block-backend.h |   5 +
 qapi/block-core.json           |  26 +++
 6 files changed, 395 insertions(+)
 create mode 100644 docs/io-hang.rst

-- 
2.19.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/9] block-backend: introduce I/O rehandle info
  2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
@ 2020-10-22 13:02 ` Jiahui Cen
  2020-10-22 13:02 ` [PATCH v3 2/9] block-backend: rehandle block aios when EIO Jiahui Cen
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Jiahui Cen @ 2020-10-22 13:02 UTC (permalink / raw)
  To: qemu-devel, kwolf, mreitz, eblake
  Cc: cenjiahui, zhang.zhanghailiang, qemu-block, stefanha, fangying1, jsnow

The I/O hang feature is realized based on a rehandle mechanism.
Each block backend will have a list to store hanging block AIOs,
and a timer to regularly resend these aios. In order to issue
the AIOs again, each block AIOs also need to store its coroutine entry.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
---
 block/block-backend.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index ce78d30794..b8367d82cc 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -35,6 +35,18 @@
 
 static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
 
+/* block backend rehandle timer interval 5s */
+#define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL   5000
+
+typedef struct BlockBackendRehandleInfo {
+    bool enable;
+    QEMUTimer *ts;
+    unsigned timer_interval_ms;
+
+    unsigned int in_flight;
+    QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios;
+} BlockBackendRehandleInfo;
+
 typedef struct BlockBackendAioNotifier {
     void (*attached_aio_context)(AioContext *new_context, void *opaque);
     void (*detach_aio_context)(void *opaque);
@@ -95,6 +107,8 @@ struct BlockBackend {
      * Accessed with atomic ops.
      */
     unsigned int in_flight;
+
+    BlockBackendRehandleInfo reinfo;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -350,6 +364,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm)
     qemu_co_queue_init(&blk->queued_requests);
     notifier_list_init(&blk->remove_bs_notifiers);
     notifier_list_init(&blk->insert_bs_notifiers);
+
     QLIST_INIT(&blk->aio_notifiers);
 
     QTAILQ_INSERT_TAIL(&block_backends, blk, link);
@@ -1392,6 +1407,10 @@ typedef struct BlkAioEmAIOCB {
     BlkRwCo rwco;
     int bytes;
     bool has_returned;
+
+    /* for rehandle */
+    CoroutineEntry *co_entry;
+    QTAILQ_ENTRY(BlkAioEmAIOCB) list;
 } BlkAioEmAIOCB;
 
 static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_)
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/9] block-backend: rehandle block aios when EIO
  2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
  2020-10-22 13:02 ` [PATCH v3 1/9] block-backend: introduce I/O rehandle info Jiahui Cen
@ 2020-10-22 13:02 ` Jiahui Cen
  2020-10-22 13:02 ` [PATCH v3 3/9] block-backend: add I/O hang timeout Jiahui Cen
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Jiahui Cen @ 2020-10-22 13:02 UTC (permalink / raw)
  To: qemu-devel, kwolf, mreitz, eblake
  Cc: cenjiahui, zhang.zhanghailiang, qemu-block, stefanha, fangying1, jsnow

When a backend device temporarily does not response, like a network disk down
due to some network faults, any IO to the coresponding virtual block device
in VM would return I/O error. If the hypervisor returns the error to VM, the
filesystem on this block device may not work as usual. And in many situations,
the returned error is often an EIO.

To avoid this unavailablity, we can store the failed AIOs, and resend them
later. If the error is temporary, the retries can succeed and the AIOs can
be successfully completed.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
---
 block/block-backend.c | 89 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index b8367d82cc..8050669d23 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -365,6 +365,12 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm)
     notifier_list_init(&blk->remove_bs_notifiers);
     notifier_list_init(&blk->insert_bs_notifiers);
 
+    /* for rehandle */
+    blk->reinfo.enable = false;
+    blk->reinfo.ts = NULL;
+    qatomic_set(&blk->reinfo.in_flight, 0);
+    QTAILQ_INIT(&blk->reinfo.re_aios);
+
     QLIST_INIT(&blk->aio_notifiers);
 
     QTAILQ_INSERT_TAIL(&block_backends, blk, link);
@@ -1425,8 +1431,16 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
     .get_aio_context    = blk_aio_em_aiocb_get_aio_context,
 };
 
+static void blk_rehandle_timer_cb(void *opaque);
+static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb);
+
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
+    if (acb->rwco.blk->reinfo.enable) {
+        blk_rehandle_aio_complete(acb);
+        return;
+    }
+
     if (acb->has_returned) {
         acb->common.cb(acb->common.opaque, acb->rwco.ret);
         blk_dec_in_flight(acb->rwco.blk);
@@ -1459,6 +1473,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
         .ret    = NOT_DONE,
     };
     acb->bytes = bytes;
+    acb->co_entry = co_entry;
     acb->has_returned = false;
 
     co = qemu_coroutine_create(co_entry, acb);
@@ -2054,6 +2069,20 @@ static int blk_do_set_aio_context(BlockBackend *blk, AioContext *new_context,
             throttle_group_attach_aio_context(tgm, new_context);
             bdrv_drained_end(bs);
         }
+
+        if (blk->reinfo.enable) {
+            if (blk->reinfo.ts) {
+                timer_del(blk->reinfo.ts);
+                timer_free(blk->reinfo.ts);
+            }
+            blk->reinfo.ts = aio_timer_new(new_context, QEMU_CLOCK_REALTIME,
+                                           SCALE_MS, blk_rehandle_timer_cb,
+                                           blk);
+            if (qatomic_read(&blk->reinfo.in_flight)) {
+                timer_mod(blk->reinfo.ts,
+                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
+            }
+        }
     }
 
     blk->ctx = new_context;
@@ -2406,6 +2435,66 @@ static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter)
     }
 }
 
+static void blk_rehandle_insert_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb)
+{
+    assert(blk->reinfo.enable);
+
+    qatomic_inc(&blk->reinfo.in_flight);
+    QTAILQ_INSERT_TAIL(&blk->reinfo.re_aios, acb, list);
+    timer_mod(blk->reinfo.ts, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+                              blk->reinfo.timer_interval_ms);
+}
+
+static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb)
+{
+    QTAILQ_REMOVE(&blk->reinfo.re_aios, acb, list);
+    qatomic_dec(&blk->reinfo.in_flight);
+}
+
+static void blk_rehandle_timer_cb(void *opaque)
+{
+    BlockBackend *blk = opaque;
+    BlockBackendRehandleInfo *reinfo = &blk->reinfo;
+    BlkAioEmAIOCB *acb, *tmp;
+    Coroutine *co;
+
+    aio_context_acquire(blk_get_aio_context(blk));
+    QTAILQ_FOREACH_SAFE(acb, &reinfo->re_aios, list, tmp) {
+        if (acb->rwco.ret == NOT_DONE) {
+            continue;
+        }
+
+        blk_inc_in_flight(acb->rwco.blk);
+        acb->rwco.ret = NOT_DONE;
+        acb->has_returned = false;
+        blk_rehandle_remove_aiocb(acb->rwco.blk, acb);
+
+        co = qemu_coroutine_create(acb->co_entry, acb);
+        qemu_coroutine_enter(co);
+
+        acb->has_returned = true;
+        if (acb->rwco.ret != NOT_DONE) {
+            replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+                                             blk_aio_complete_bh, acb);
+        }
+    }
+    aio_context_release(blk_get_aio_context(blk));
+}
+
+static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb)
+{
+    if (acb->has_returned) {
+        blk_dec_in_flight(acb->rwco.blk);
+        if (acb->rwco.ret == -EIO) {
+            blk_rehandle_insert_aiocb(acb->rwco.blk, acb);
+            return;
+        }
+
+        acb->common.cb(acb->common.opaque, acb->rwco.ret);
+        qemu_aio_unref(acb);
+    }
+}
+
 void blk_register_buf(BlockBackend *blk, void *host, size_t size)
 {
     bdrv_register_buf(blk_bs(blk), host, size);
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/9] block-backend: add I/O hang timeout
  2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
  2020-10-22 13:02 ` [PATCH v3 1/9] block-backend: introduce I/O rehandle info Jiahui Cen
  2020-10-22 13:02 ` [PATCH v3 2/9] block-backend: rehandle block aios when EIO Jiahui Cen
@ 2020-10-22 13:02 ` Jiahui Cen
  2020-10-22 13:02 ` [PATCH v3 4/9] block-backend: add I/O rehandle pause/unpause Jiahui Cen
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Jiahui Cen @ 2020-10-22 13:02 UTC (permalink / raw)
  To: qemu-devel, kwolf, mreitz, eblake
  Cc: cenjiahui, zhang.zhanghailiang, qemu-block, stefanha, fangying1, jsnow

Not all errors would be fixed, so it is better to add a rehandle timeout
for I/O hang.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
---
 block/block-backend.c          | 99 +++++++++++++++++++++++++++++++++-
 include/sysemu/block-backend.h |  2 +
 2 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 8050669d23..90fcc678b5 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -38,6 +38,11 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
 /* block backend rehandle timer interval 5s */
 #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL   5000
 
+enum BlockIOHangStatus {
+    BLOCK_IO_HANG_STATUS_NORMAL = 0,
+    BLOCK_IO_HANG_STATUS_HANG,
+};
+
 typedef struct BlockBackendRehandleInfo {
     bool enable;
     QEMUTimer *ts;
@@ -109,6 +114,11 @@ struct BlockBackend {
     unsigned int in_flight;
 
     BlockBackendRehandleInfo reinfo;
+
+    int64_t iohang_timeout; /* The I/O hang timeout value in sec. */
+    int64_t iohang_time;    /* The I/O hang start time */
+    bool is_iohang_timeout;
+    int iohang_status;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -2481,20 +2491,107 @@ static void blk_rehandle_timer_cb(void *opaque)
     aio_context_release(blk_get_aio_context(blk));
 }
 
+static bool blk_iohang_handle(BlockBackend *blk, int new_status)
+{
+    int64_t now;
+    int old_status = blk->iohang_status;
+    bool need_rehandle = false;
+
+    switch (new_status) {
+    case BLOCK_IO_HANG_STATUS_NORMAL:
+        if (old_status == BLOCK_IO_HANG_STATUS_HANG) {
+            /* Case when I/O Hang is recovered */
+            blk->is_iohang_timeout = false;
+            blk->iohang_time = 0;
+        }
+        break;
+    case BLOCK_IO_HANG_STATUS_HANG:
+        if (old_status != BLOCK_IO_HANG_STATUS_HANG) {
+            /* Case when I/O hang is first triggered */
+            blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
+            need_rehandle = true;
+        } else {
+            if (!blk->is_iohang_timeout) {
+                now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
+                if (now >= (blk->iohang_time + blk->iohang_timeout)) {
+                    /* Case when I/O hang is timeout */
+                    blk->is_iohang_timeout = true;
+                } else {
+                    /* Case when I/O hang is continued */
+                    need_rehandle = true;
+                }
+            }
+        }
+        break;
+    default:
+        break;
+    }
+
+    blk->iohang_status = new_status;
+    return need_rehandle;
+}
+
+static bool blk_rehandle_aio(BlkAioEmAIOCB *acb, bool *has_timeout)
+{
+    bool need_rehandle = false;
+
+    /* Rehandle aio which returns EIO before hang timeout */
+    if (acb->rwco.ret == -EIO) {
+        if (acb->rwco.blk->is_iohang_timeout) {
+            /* I/O hang has timeout and not recovered */
+            *has_timeout = true;
+        } else {
+            need_rehandle = blk_iohang_handle(acb->rwco.blk,
+                                              BLOCK_IO_HANG_STATUS_HANG);
+            /* I/O hang timeout first trigger */
+            if (acb->rwco.blk->is_iohang_timeout) {
+                *has_timeout = true;
+            }
+        }
+    }
+
+    return need_rehandle;
+}
+
 static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb)
 {
+    bool has_timeout = false;
+    bool need_rehandle = false;
+
     if (acb->has_returned) {
         blk_dec_in_flight(acb->rwco.blk);
-        if (acb->rwco.ret == -EIO) {
+        need_rehandle = blk_rehandle_aio(acb, &has_timeout);
+        if (need_rehandle) {
             blk_rehandle_insert_aiocb(acb->rwco.blk, acb);
             return;
         }
 
         acb->common.cb(acb->common.opaque, acb->rwco.ret);
+
+        /* I/O hang return to normal status */
+        if (!has_timeout) {
+            blk_iohang_handle(acb->rwco.blk, BLOCK_IO_HANG_STATUS_NORMAL);
+        }
+
         qemu_aio_unref(acb);
     }
 }
 
+void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout)
+{
+    if (!blk) {
+        return;
+    }
+
+    blk->is_iohang_timeout = false;
+    blk->iohang_time = 0;
+    blk->iohang_timeout = 0;
+    blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL;
+    if (iohang_timeout > 0) {
+        blk->iohang_timeout = iohang_timeout;
+    }
+}
+
 void blk_register_buf(BlockBackend *blk, void *host, size_t size)
 {
     bdrv_register_buf(blk_bs(blk), host, size);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8203d7f6f9..bfebe3a960 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -268,4 +268,6 @@ const BdrvChild *blk_root(BlockBackend *blk);
 
 int blk_make_empty(BlockBackend *blk, Error **errp);
 
+void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout);
+
 #endif
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/9] block-backend: add I/O rehandle pause/unpause
  2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
                   ` (2 preceding siblings ...)
  2020-10-22 13:02 ` [PATCH v3 3/9] block-backend: add I/O hang timeout Jiahui Cen
@ 2020-10-22 13:02 ` Jiahui Cen
  2020-10-22 13:02 ` [PATCH v3 5/9] block-backend: enable I/O hang when timeout is set Jiahui Cen
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Jiahui Cen @ 2020-10-22 13:02 UTC (permalink / raw)
  To: qemu-devel, kwolf, mreitz, eblake
  Cc: cenjiahui, zhang.zhanghailiang, qemu-block, stefanha, fangying1, jsnow

Sometimes there is no need to rehandle AIOs although I/O hang is enabled. For
example, when deleting a block backend, we have to wait AIO completed by calling
blk_drain(), but not care about the results. So a pause interface of I/O hang
is helpful to bypass the rehandle mechanism.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
---
 block/block-backend.c          | 60 +++++++++++++++++++++++++++++++---
 include/sysemu/block-backend.h |  2 ++
 2 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 90fcc678b5..c16d95a2c9 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -37,6 +37,9 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
 
 /* block backend rehandle timer interval 5s */
 #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL   5000
+#define BLOCK_BACKEND_REHANDLE_NORMAL           1
+#define BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED  2
+#define BLOCK_BACKEND_REHANDLE_DRAINED          3
 
 enum BlockIOHangStatus {
     BLOCK_IO_HANG_STATUS_NORMAL = 0,
@@ -50,6 +53,8 @@ typedef struct BlockBackendRehandleInfo {
 
     unsigned int in_flight;
     QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios;
+
+    int status;
 } BlockBackendRehandleInfo;
 
 typedef struct BlockBackendAioNotifier {
@@ -2461,6 +2466,51 @@ static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb)
     qatomic_dec(&blk->reinfo.in_flight);
 }
 
+static void blk_rehandle_drain(BlockBackend *blk)
+{
+    if (blk_bs(blk)) {
+        bdrv_drained_begin(blk_bs(blk));
+        BDRV_POLL_WHILE(blk_bs(blk), qatomic_read(&blk->reinfo.in_flight) > 0);
+        bdrv_drained_end(blk_bs(blk));
+    }
+}
+
+static bool blk_rehandle_is_paused(BlockBackend *blk)
+{
+    return blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED ||
+           blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAINED;
+}
+
+void blk_rehandle_pause(BlockBackend *blk)
+{
+    BlockBackendRehandleInfo *reinfo = &blk->reinfo;
+
+    aio_context_acquire(blk_get_aio_context(blk));
+    if (!reinfo->enable || reinfo->status == BLOCK_BACKEND_REHANDLE_DRAINED) {
+        aio_context_release(blk_get_aio_context(blk));
+        return;
+    }
+
+    reinfo->status = BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED;
+    blk_rehandle_drain(blk);
+    reinfo->status = BLOCK_BACKEND_REHANDLE_DRAINED;
+    aio_context_release(blk_get_aio_context(blk));
+}
+
+void blk_rehandle_unpause(BlockBackend *blk)
+{
+    BlockBackendRehandleInfo *reinfo = &blk->reinfo;
+
+    aio_context_acquire(blk_get_aio_context(blk));
+    if (!reinfo->enable || reinfo->status == BLOCK_BACKEND_REHANDLE_NORMAL) {
+        aio_context_release(blk_get_aio_context(blk));
+        return;
+    }
+
+    reinfo->status = BLOCK_BACKEND_REHANDLE_NORMAL;
+    aio_context_release(blk_get_aio_context(blk));
+}
+
 static void blk_rehandle_timer_cb(void *opaque)
 {
     BlockBackend *blk = opaque;
@@ -2560,10 +2610,12 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb)
 
     if (acb->has_returned) {
         blk_dec_in_flight(acb->rwco.blk);
-        need_rehandle = blk_rehandle_aio(acb, &has_timeout);
-        if (need_rehandle) {
-            blk_rehandle_insert_aiocb(acb->rwco.blk, acb);
-            return;
+        if (!blk_rehandle_is_paused(acb->rwco.blk)) {
+            need_rehandle = blk_rehandle_aio(acb, &has_timeout);
+            if (need_rehandle) {
+                blk_rehandle_insert_aiocb(acb->rwco.blk, acb);
+                return;
+            }
         }
 
         acb->common.cb(acb->common.opaque, acb->rwco.ret);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index bfebe3a960..391a047444 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -268,6 +268,8 @@ const BdrvChild *blk_root(BlockBackend *blk);
 
 int blk_make_empty(BlockBackend *blk, Error **errp);
 
+void blk_rehandle_pause(BlockBackend *blk);
+void blk_rehandle_unpause(BlockBackend *blk);
 void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout);
 
 #endif
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 5/9] block-backend: enable I/O hang when timeout is set
  2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
                   ` (3 preceding siblings ...)
  2020-10-22 13:02 ` [PATCH v3 4/9] block-backend: add I/O rehandle pause/unpause Jiahui Cen
@ 2020-10-22 13:02 ` Jiahui Cen
  2020-10-22 13:03 ` [PATCH v3 6/9] virtio-blk: pause I/O hang when resetting Jiahui Cen
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Jiahui Cen @ 2020-10-22 13:02 UTC (permalink / raw)
  To: qemu-devel, kwolf, mreitz, eblake
  Cc: cenjiahui, zhang.zhanghailiang, qemu-block, stefanha, fangying1, jsnow

Setting a non-zero timeout of I/O hang indicates I/O hang is enabled for the
block backend. And when the block backend is going to be deleted, we should
disable I/O hang.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
---
 block/block-backend.c          | 40 ++++++++++++++++++++++++++++++++++
 include/sysemu/block-backend.h |  1 +
 2 files changed, 41 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index c16d95a2c9..c812b3a9c7 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -34,6 +34,7 @@
 #define NOT_DONE 0x7fffffff /* used while emulated sync operation in progress */
 
 static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
+static void blk_rehandle_disable(BlockBackend *blk);
 
 /* block backend rehandle timer interval 5s */
 #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL   5000
@@ -476,6 +477,8 @@ static void blk_delete(BlockBackend *blk)
     assert(!blk->refcnt);
     assert(!blk->name);
     assert(!blk->dev);
+    assert(qatomic_read(&blk->reinfo.in_flight) == 0);
+    blk_rehandle_disable(blk);
     if (blk->public.throttle_group_member.throttle_state) {
         blk_io_limits_disable(blk);
     }
@@ -2629,6 +2632,42 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb)
     }
 }
 
+static void blk_rehandle_enable(BlockBackend *blk)
+{
+    BlockBackendRehandleInfo *reinfo = &blk->reinfo;
+
+    aio_context_acquire(blk_get_aio_context(blk));
+    if (reinfo->enable) {
+        aio_context_release(blk_get_aio_context(blk));
+        return;
+    }
+
+    reinfo->ts = aio_timer_new(blk_get_aio_context(blk), QEMU_CLOCK_REALTIME,
+                               SCALE_MS, blk_rehandle_timer_cb, blk);
+    reinfo->timer_interval_ms = BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL;
+    reinfo->status = BLOCK_BACKEND_REHANDLE_NORMAL;
+    reinfo->enable = true;
+    aio_context_release(blk_get_aio_context(blk));
+}
+
+static void blk_rehandle_disable(BlockBackend *blk)
+{
+    if (!blk->reinfo.enable) {
+        return;
+    }
+
+    blk_rehandle_pause(blk);
+    timer_del(blk->reinfo.ts);
+    timer_free(blk->reinfo.ts);
+    blk->reinfo.ts = NULL;
+    blk->reinfo.enable = false;
+}
+
+bool blk_iohang_is_enabled(BlockBackend *blk)
+{
+    return blk->iohang_timeout != 0;
+}
+
 void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout)
 {
     if (!blk) {
@@ -2641,6 +2680,7 @@ void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout)
     blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL;
     if (iohang_timeout > 0) {
         blk->iohang_timeout = iohang_timeout;
+        blk_rehandle_enable(blk);
     }
 }
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 391a047444..851b90b99b 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -270,6 +270,7 @@ int blk_make_empty(BlockBackend *blk, Error **errp);
 
 void blk_rehandle_pause(BlockBackend *blk);
 void blk_rehandle_unpause(BlockBackend *blk);
+bool blk_iohang_is_enabled(BlockBackend *blk);
 void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout);
 
 #endif
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 6/9] virtio-blk: pause I/O hang when resetting
  2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
                   ` (4 preceding siblings ...)
  2020-10-22 13:02 ` [PATCH v3 5/9] block-backend: enable I/O hang when timeout is set Jiahui Cen
@ 2020-10-22 13:03 ` Jiahui Cen
  2020-10-22 13:03 ` [PATCH v3 7/9] qemu-option: add I/O hang timeout option Jiahui Cen
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Jiahui Cen @ 2020-10-22 13:03 UTC (permalink / raw)
  To: qemu-devel, kwolf, mreitz, eblake
  Cc: cenjiahui, zhang.zhanghailiang, qemu-block, stefanha, fangying1, jsnow

When resetting virtio-blk, we have to drain all AIOs but do not care about the
results. So it is necessary to disable I/O hang before resetting virtio-blk,
and enable it after resetting.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
---
 hw/block/virtio-blk.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index bac2d6fa2b..f96e4ac274 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -899,6 +899,10 @@ static void virtio_blk_reset(VirtIODevice *vdev)
     AioContext *ctx;
     VirtIOBlockReq *req;
 
+    if (blk_iohang_is_enabled(s->blk)) {
+        blk_rehandle_pause(s->blk);
+    }
+
     ctx = blk_get_aio_context(s->blk);
     aio_context_acquire(ctx);
     blk_drain(s->blk);
@@ -916,6 +920,10 @@ static void virtio_blk_reset(VirtIODevice *vdev)
 
     assert(!s->dataplane_started);
     blk_set_enable_write_cache(s->blk, s->original_wce);
+
+    if (blk_iohang_is_enabled(s->blk)) {
+        blk_rehandle_unpause(s->blk);
+    }
 }
 
 /* coalesce internal state, copy to pci i/o region 0
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 7/9] qemu-option: add I/O hang timeout option
  2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
                   ` (5 preceding siblings ...)
  2020-10-22 13:03 ` [PATCH v3 6/9] virtio-blk: pause I/O hang when resetting Jiahui Cen
@ 2020-10-22 13:03 ` Jiahui Cen
  2020-10-22 13:03 ` [PATCH v3 8/9] qapi: add I/O hang and I/O hang timeout qapi event Jiahui Cen
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Jiahui Cen @ 2020-10-22 13:03 UTC (permalink / raw)
  To: qemu-devel, kwolf, mreitz, eblake
  Cc: cenjiahui, zhang.zhanghailiang, qemu-block, stefanha, fangying1, jsnow

I/O hang timeout should be different under different situations. So it is
better to provide an option for user to determine I/O hang timeout for
each block device.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
---
 blockdev.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index fe6fb5dc1d..0a77eedfc4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -501,6 +501,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
     BlockdevDetectZeroesOptions detect_zeroes =
         BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
     const char *throttling_group = NULL;
+    int64_t iohang_timeout = 0;
 
     /* Check common options by copying from bs_opts to opts, all other options
      * stay in bs_opts for processing by bdrv_open(). */
@@ -623,6 +624,12 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
 
         bs->detect_zeroes = detect_zeroes;
 
+        /* init timeout value for I/O Hang */
+        iohang_timeout = qemu_opt_get_number(opts, "iohang-timeout", 0);
+        if (iohang_timeout > 0) {
+            blk_iohang_init(blk, iohang_timeout);
+        }
+
         block_acct_setup(blk_get_stats(blk), account_invalid, account_failed);
 
         if (!parse_stats_intervals(blk_get_stats(blk), interval_list, errp)) {
@@ -3796,6 +3803,10 @@ QemuOptsList qemu_common_drive_opts = {
             .type = QEMU_OPT_BOOL,
             .help = "whether to account for failed I/O operations "
                     "in the statistics",
+        },{
+            .name = "iohang-timeout",
+            .type = QEMU_OPT_NUMBER,
+            .help = "timeout value for I/O Hang",
         },
         { /* end of list */ }
     },
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 8/9] qapi: add I/O hang and I/O hang timeout qapi event
  2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
                   ` (6 preceding siblings ...)
  2020-10-22 13:03 ` [PATCH v3 7/9] qemu-option: add I/O hang timeout option Jiahui Cen
@ 2020-10-22 13:03 ` Jiahui Cen
  2020-10-22 13:03 ` [PATCH v3 9/9] docs: add a doc about I/O hang Jiahui Cen
  2020-10-26 16:53 ` [PATCH v3 0/9] block-backend: Introduce " Stefan Hajnoczi
  9 siblings, 0 replies; 14+ messages in thread
From: Jiahui Cen @ 2020-10-22 13:03 UTC (permalink / raw)
  To: qemu-devel, kwolf, mreitz, eblake
  Cc: cenjiahui, zhang.zhanghailiang, qemu-block, stefanha, fangying1, jsnow

Sometimes hypervisor management tools like libvirt may need to monitor
I/O hang events. Let's report I/O hang and I/O hang timeout event via qapi.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
---
 block/block-backend.c |  3 +++
 qapi/block-core.json  | 26 ++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index c812b3a9c7..42337ceb04 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2556,6 +2556,7 @@ static bool blk_iohang_handle(BlockBackend *blk, int new_status)
             /* Case when I/O Hang is recovered */
             blk->is_iohang_timeout = false;
             blk->iohang_time = 0;
+            qapi_event_send_block_io_hang(false);
         }
         break;
     case BLOCK_IO_HANG_STATUS_HANG:
@@ -2563,12 +2564,14 @@ static bool blk_iohang_handle(BlockBackend *blk, int new_status)
             /* Case when I/O hang is first triggered */
             blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
             need_rehandle = true;
+            qapi_event_send_block_io_hang(true);
         } else {
             if (!blk->is_iohang_timeout) {
                 now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
                 if (now >= (blk->iohang_time + blk->iohang_timeout)) {
                     /* Case when I/O hang is timeout */
                     blk->is_iohang_timeout = true;
+                    qapi_event_send_block_io_hang_timeout(true);
                 } else {
                     /* Case when I/O hang is continued */
                     need_rehandle = true;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ee5ebef7f2..709514498c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5379,3 +5379,29 @@
 { 'command': 'blockdev-snapshot-delete-internal-sync',
   'data': { 'device': 'str', '*id': 'str', '*name': 'str'},
   'returns': 'SnapshotInfo' }
+
+##
+# @BLOCK_IO_HANG:
+#
+# Emitted when device I/O hang trigger event begin or end
+#
+# @set: true if I/O hang begin; false if I/O hang end.
+#
+# Since: 5.2
+#
+##
+{ 'event': 'BLOCK_IO_HANG',
+  'data': { 'set': 'bool' }}
+
+##
+# @BLOCK_IO_HANG_TIMEOUT:
+#
+# Emitted when device I/O hang timeout event set or clear
+#
+# @set: true if set; false if clear.
+#
+# Since: 5.2
+#
+##
+{ 'event': 'BLOCK_IO_HANG_TIMEOUT',
+  'data': { 'set': 'bool' }}
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 9/9] docs: add a doc about I/O hang
  2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
                   ` (7 preceding siblings ...)
  2020-10-22 13:03 ` [PATCH v3 8/9] qapi: add I/O hang and I/O hang timeout qapi event Jiahui Cen
@ 2020-10-22 13:03 ` Jiahui Cen
  2020-10-26 16:53 ` [PATCH v3 0/9] block-backend: Introduce " Stefan Hajnoczi
  9 siblings, 0 replies; 14+ messages in thread
From: Jiahui Cen @ 2020-10-22 13:03 UTC (permalink / raw)
  To: qemu-devel, kwolf, mreitz, eblake
  Cc: cenjiahui, zhang.zhanghailiang, qemu-block, stefanha, fangying1, jsnow

Give some details about the I/O hang and how to use it.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
---
 docs/io-hang.rst | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 docs/io-hang.rst

diff --git a/docs/io-hang.rst b/docs/io-hang.rst
new file mode 100644
index 0000000000..53356ceed2
--- /dev/null
+++ b/docs/io-hang.rst
@@ -0,0 +1,45 @@
+========
+I/O hang
+========
+
+Introduction
+============
+
+I/O hang is a mechanism to automatically rehandle AIOs when an error occurs on
+the backend block device, which is unperceivable for Guest. If the error on the
+backend device is temporary, like NFS returns EIO due to network fluctuations,
+once the device is recovered, the AIOs will be resent automatically and done
+successfully. If the error on the device is unrecoverable, the failed AIOs will
+be returned to Guest after rehandle timeout.
+
+This mechanism can provide self-recovery and high availability to VM. From the
+view of Guest, AIOs sent to the device are hung for a time and the finished, and
+any other unrelated service in Guest can work as usual.
+
+Implementation
+==============
+
+Each BlockBackend will have a list to store AIOs which need be rehandled and a
+timer to monitor the timeout.
+
+If I/O hang is enabled, all returned AIOs will be checked and the failed ones
+will be inserted into BlockBackend's list. The timer will be started and the
+AIOs in the list will be resent to the backend device. Once the result of AIOs
+relevant to this backend device is success, the AIOs will be returned back to
+Guest. Otherwise, those AIOs will be rehandled periodically until timeout.
+
+I/O hang provides interfaces to drain all the AIOs in BlockBackend's list. There
+are some situations to drain the rehandling AIOs, eg. resetting virtio-blk.
+
+I/O hang also provides qapi events, which can be used for manager tools like
+libvirt to monitor.
+
+Examples
+========
+
+Enable I/O hang when launching QEMU::
+
+      $ qemu-system-x86_64                                      \
+          -machine pc,accel=kvm -smp 1 -m 2048                  \
+          -drive file=shared.img,format=raw,cache=none,aio=native,iohang-timeout=60
+
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/9] block-backend: Introduce I/O hang
  2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
                   ` (8 preceding siblings ...)
  2020-10-22 13:03 ` [PATCH v3 9/9] docs: add a doc about I/O hang Jiahui Cen
@ 2020-10-26 16:53 ` Stefan Hajnoczi
  2020-10-29  9:42   ` cenjiahui
  9 siblings, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2020-10-26 16:53 UTC (permalink / raw)
  To: Jiahui Cen
  Cc: kwolf, zhang.zhanghailiang, qemu-block, qemu-devel, mreitz,
	fangying1, jsnow

[-- Attachment #1: Type: text/plain, Size: 1850 bytes --]

On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
> A VM in the cloud environment may use a virutal disk as the backend storage,
> and there are usually filesystems on the virtual block device. When backend
> storage is temporarily down, any I/O issued to the virtual block device will
> cause an error. For example, an error occurred in ext4 filesystem would make
> the filesystem readonly. However a cloud backend storage can be soon recovered.
> For example, an IP-SAN may be down due to network failure and will be online
> soon after network is recovered. The error in the filesystem may not be
> recovered unless a device reattach or system restart. So an I/O rehandle is
> in need to implement a self-healing mechanism.
> 
> This patch series propose a feature called I/O hang. It can rehandle AIOs
> with EIO error without sending error back to guest. From guest's perspective
> of view it is just like an IO is hanging and not returned. Guest can get
> back running smoothly when I/O is recovred with this feature enabled.

Hi,
This feature seems like an extension of the existing -drive
rerror=/werror= parameters:

  werror=action,rerror=action
      Specify which action to take on write and read errors. Valid
      actions are: "ignore" (ignore the error and try to continue),
      "stop" (pause QEMU), "report" (report the error to the guest),
      "enospc" (pause QEMU only if the host disk is full; report the
      error to the guest otherwise).  The default setting is
      werror=enospc and rerror=report.

That mechanism already has a list of requests to retry and live
migration integration. Using the werror=/rerror= mechanism would avoid
code duplication between these features. You could add a
werror/rerror=retry error action for this feature.

Does that sound good?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/9] block-backend: Introduce I/O hang
  2020-10-26 16:53 ` [PATCH v3 0/9] block-backend: Introduce " Stefan Hajnoczi
@ 2020-10-29  9:42   ` cenjiahui
  2020-10-30 13:21     ` Stefan Hajnoczi
  0 siblings, 1 reply; 14+ messages in thread
From: cenjiahui @ 2020-10-29  9:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, zhang.zhanghailiang, qemu-block, qemu-devel, mreitz,
	fangying1, jsnow


On 2020/10/27 0:53, Stefan Hajnoczi wrote:
> On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
>> A VM in the cloud environment may use a virutal disk as the backend storage,
>> and there are usually filesystems on the virtual block device. When backend
>> storage is temporarily down, any I/O issued to the virtual block device will
>> cause an error. For example, an error occurred in ext4 filesystem would make
>> the filesystem readonly. However a cloud backend storage can be soon recovered.
>> For example, an IP-SAN may be down due to network failure and will be online
>> soon after network is recovered. The error in the filesystem may not be
>> recovered unless a device reattach or system restart. So an I/O rehandle is
>> in need to implement a self-healing mechanism.
>>
>> This patch series propose a feature called I/O hang. It can rehandle AIOs
>> with EIO error without sending error back to guest. From guest's perspective
>> of view it is just like an IO is hanging and not returned. Guest can get
>> back running smoothly when I/O is recovred with this feature enabled.
> 
> Hi,
> This feature seems like an extension of the existing -drive
> rerror=/werror= parameters:
> 
>   werror=action,rerror=action
>       Specify which action to take on write and read errors. Valid
>       actions are: "ignore" (ignore the error and try to continue),
>       "stop" (pause QEMU), "report" (report the error to the guest),
>       "enospc" (pause QEMU only if the host disk is full; report the
>       error to the guest otherwise).  The default setting is
>       werror=enospc and rerror=report.
> 
> That mechanism already has a list of requests to retry and live
> migration integration. Using the werror=/rerror= mechanism would avoid
> code duplication between these features. You could add a
> werror/rerror=retry error action for this feature.
> 
> Does that sound good?
> 
> Stefan
> 

Hi Stefan,

Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible
way for the retry feature.

However, AFAIK, the rerror=/werror= mechanism in block-backend layer only
provides ACTION, and the real handler of errors need be implemented several
times in device layer for different devices. While our I/O Hang mechanism
directly handles AIO errors no matter which type of devices it is. Is it a
more common way to implement the feature in block-backend layer? Especially we
can set retry timeout in a common structure BlockBackend.

Besides, is there any reason that QEMU implements the rerror=/werror mechansim
in device layer rather than in block-backend layer?

Jiahui


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/9] block-backend: Introduce I/O hang
  2020-10-29  9:42   ` cenjiahui
@ 2020-10-30 13:21     ` Stefan Hajnoczi
  2020-11-03 12:19       ` cenjiahui
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2020-10-30 13:21 UTC (permalink / raw)
  To: cenjiahui
  Cc: kwolf, zhang.zhanghailiang, qemu-block, qemu-devel, mreitz,
	fangying1, jsnow

[-- Attachment #1: Type: text/plain, Size: 3324 bytes --]

On Thu, Oct 29, 2020 at 05:42:42PM +0800, cenjiahui wrote:
> 
> On 2020/10/27 0:53, Stefan Hajnoczi wrote:
> > On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
> >> A VM in the cloud environment may use a virutal disk as the backend storage,
> >> and there are usually filesystems on the virtual block device. When backend
> >> storage is temporarily down, any I/O issued to the virtual block device will
> >> cause an error. For example, an error occurred in ext4 filesystem would make
> >> the filesystem readonly. However a cloud backend storage can be soon recovered.
> >> For example, an IP-SAN may be down due to network failure and will be online
> >> soon after network is recovered. The error in the filesystem may not be
> >> recovered unless a device reattach or system restart. So an I/O rehandle is
> >> in need to implement a self-healing mechanism.
> >>
> >> This patch series propose a feature called I/O hang. It can rehandle AIOs
> >> with EIO error without sending error back to guest. From guest's perspective
> >> of view it is just like an IO is hanging and not returned. Guest can get
> >> back running smoothly when I/O is recovred with this feature enabled.
> > 
> > Hi,
> > This feature seems like an extension of the existing -drive
> > rerror=/werror= parameters:
> > 
> >   werror=action,rerror=action
> >       Specify which action to take on write and read errors. Valid
> >       actions are: "ignore" (ignore the error and try to continue),
> >       "stop" (pause QEMU), "report" (report the error to the guest),
> >       "enospc" (pause QEMU only if the host disk is full; report the
> >       error to the guest otherwise).  The default setting is
> >       werror=enospc and rerror=report.
> > 
> > That mechanism already has a list of requests to retry and live
> > migration integration. Using the werror=/rerror= mechanism would avoid
> > code duplication between these features. You could add a
> > werror/rerror=retry error action for this feature.
> > 
> > Does that sound good?
> > 
> > Stefan
> > 
> 
> Hi Stefan,
> 
> Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible
> way for the retry feature.
> 
> However, AFAIK, the rerror=/werror= mechanism in block-backend layer only
> provides ACTION, and the real handler of errors need be implemented several
> times in device layer for different devices. While our I/O Hang mechanism
> directly handles AIO errors no matter which type of devices it is. Is it a
> more common way to implement the feature in block-backend layer? Especially we
> can set retry timeout in a common structure BlockBackend.
> 
> Besides, is there any reason that QEMU implements the rerror=/werror mechansim
> in device layer rather than in block-backend layer?

Yes, it's because failed requests can be live-migrated and retried on
the destination host. In other words, live migration still works even
when there are failed requests.

There may be things that can be refactored so there is less duplication
in devices, but the basic design goal is that the block layer doesn't
keep track of failed requests because they are live migrated together
with the device state.

Maybe Kevin Wolf has more thoughts to share about rerror=/werror=.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/9] block-backend: Introduce I/O hang
  2020-10-30 13:21     ` Stefan Hajnoczi
@ 2020-11-03 12:19       ` cenjiahui
  0 siblings, 0 replies; 14+ messages in thread
From: cenjiahui @ 2020-11-03 12:19 UTC (permalink / raw)
  To: kwolf
  Cc: zhang.zhanghailiang, qemu-block, qemu-devel, mreitz,
	Stefan Hajnoczi, fangying1, jsnow


On 2020/10/30 21:21, Stefan Hajnoczi wrote:
> On Thu, Oct 29, 2020 at 05:42:42PM +0800, cenjiahui wrote:
>>
>> On 2020/10/27 0:53, Stefan Hajnoczi wrote:
>>> On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
>>>> A VM in the cloud environment may use a virutal disk as the backend storage,
>>>> and there are usually filesystems on the virtual block device. When backend
>>>> storage is temporarily down, any I/O issued to the virtual block device will
>>>> cause an error. For example, an error occurred in ext4 filesystem would make
>>>> the filesystem readonly. However a cloud backend storage can be soon recovered.
>>>> For example, an IP-SAN may be down due to network failure and will be online
>>>> soon after network is recovered. The error in the filesystem may not be
>>>> recovered unless a device reattach or system restart. So an I/O rehandle is
>>>> in need to implement a self-healing mechanism.
>>>>
>>>> This patch series propose a feature called I/O hang. It can rehandle AIOs
>>>> with EIO error without sending error back to guest. From guest's perspective
>>>> of view it is just like an IO is hanging and not returned. Guest can get
>>>> back running smoothly when I/O is recovred with this feature enabled.
>>>
>>> Hi,
>>> This feature seems like an extension of the existing -drive
>>> rerror=/werror= parameters:
>>>
>>>   werror=action,rerror=action
>>>       Specify which action to take on write and read errors. Valid
>>>       actions are: "ignore" (ignore the error and try to continue),
>>>       "stop" (pause QEMU), "report" (report the error to the guest),
>>>       "enospc" (pause QEMU only if the host disk is full; report the
>>>       error to the guest otherwise).  The default setting is
>>>       werror=enospc and rerror=report.
>>>
>>> That mechanism already has a list of requests to retry and live
>>> migration integration. Using the werror=/rerror= mechanism would avoid
>>> code duplication between these features. You could add a
>>> werror/rerror=retry error action for this feature.
>>>
>>> Does that sound good?
>>>
>>> Stefan
>>>
>>
>> Hi Stefan,
>>
>> Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible
>> way for the retry feature.
>>
>> However, AFAIK, the rerror=/werror= mechanism in block-backend layer only
>> provides ACTION, and the real handler of errors need be implemented several
>> times in device layer for different devices. While our I/O Hang mechanism
>> directly handles AIO errors no matter which type of devices it is. Is it a
>> more common way to implement the feature in block-backend layer? Especially we
>> can set retry timeout in a common structure BlockBackend.
>>
>> Besides, is there any reason that QEMU implements the rerror=/werror mechansim
>> in device layer rather than in block-backend layer?
> 
> Yes, it's because failed requests can be live-migrated and retried on
> the destination host. In other words, live migration still works even
> when there are failed requests.
> 
> There may be things that can be refactored so there is less duplication
> in devices, but the basic design goal is that the block layer doesn't
> keep track of failed requests because they are live migrated together
> with the device state.
> 
> Maybe Kevin Wolf has more thoughts to share about rerror=/werror=.
> 
> Stefan
> 

Hi Kevin,

What do you think about extending rerror=/werror= for the retry feature?

And which place is better to set retry timeout, BlockBackend in
block layer or per device structure in device layer?

Jiahui


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-11-03 12:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 1/9] block-backend: introduce I/O rehandle info Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 2/9] block-backend: rehandle block aios when EIO Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 3/9] block-backend: add I/O hang timeout Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 4/9] block-backend: add I/O rehandle pause/unpause Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 5/9] block-backend: enable I/O hang when timeout is set Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 6/9] virtio-blk: pause I/O hang when resetting Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 7/9] qemu-option: add I/O hang timeout option Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 8/9] qapi: add I/O hang and I/O hang timeout qapi event Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 9/9] docs: add a doc about I/O hang Jiahui Cen
2020-10-26 16:53 ` [PATCH v3 0/9] block-backend: Introduce " Stefan Hajnoczi
2020-10-29  9:42   ` cenjiahui
2020-10-30 13:21     ` Stefan Hajnoczi
2020-11-03 12:19       ` cenjiahui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).