qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: cenjiahui <cenjiahui@huawei.com>
Cc: kwolf@redhat.com, zhang.zhanghailiang@huawei.com,
	qemu-block@nongnu.org, qemu-devel@nongnu.org, mreitz@redhat.com,
	fangying1@huawei.com, jsnow@redhat.com
Subject: Re: [PATCH v3 0/9] block-backend: Introduce I/O hang
Date: Fri, 30 Oct 2020 13:21:53 +0000	[thread overview]
Message-ID: <20201030132153.GB320132@stefanha-x1.localdomain> (raw)
In-Reply-To: <b5aef6d9-bc2c-1cf4-b392-5db37049df33@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 3324 bytes --]

On Thu, Oct 29, 2020 at 05:42:42PM +0800, cenjiahui wrote:
> 
> On 2020/10/27 0:53, Stefan Hajnoczi wrote:
> > On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
> >> A VM in the cloud environment may use a virutal disk as the backend storage,
> >> and there are usually filesystems on the virtual block device. When backend
> >> storage is temporarily down, any I/O issued to the virtual block device will
> >> cause an error. For example, an error occurred in ext4 filesystem would make
> >> the filesystem readonly. However a cloud backend storage can be soon recovered.
> >> For example, an IP-SAN may be down due to network failure and will be online
> >> soon after network is recovered. The error in the filesystem may not be
> >> recovered unless a device reattach or system restart. So an I/O rehandle is
> >> in need to implement a self-healing mechanism.
> >>
> >> This patch series propose a feature called I/O hang. It can rehandle AIOs
> >> with EIO error without sending error back to guest. From guest's perspective
> >> of view it is just like an IO is hanging and not returned. Guest can get
> >> back running smoothly when I/O is recovred with this feature enabled.
> > 
> > Hi,
> > This feature seems like an extension of the existing -drive
> > rerror=/werror= parameters:
> > 
> >   werror=action,rerror=action
> >       Specify which action to take on write and read errors. Valid
> >       actions are: "ignore" (ignore the error and try to continue),
> >       "stop" (pause QEMU), "report" (report the error to the guest),
> >       "enospc" (pause QEMU only if the host disk is full; report the
> >       error to the guest otherwise).  The default setting is
> >       werror=enospc and rerror=report.
> > 
> > That mechanism already has a list of requests to retry and live
> > migration integration. Using the werror=/rerror= mechanism would avoid
> > code duplication between these features. You could add a
> > werror/rerror=retry error action for this feature.
> > 
> > Does that sound good?
> > 
> > Stefan
> > 
> 
> Hi Stefan,
> 
> Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible
> way for the retry feature.
> 
> However, AFAIK, the rerror=/werror= mechanism in block-backend layer only
> provides ACTION, and the real handler of errors need be implemented several
> times in device layer for different devices. While our I/O Hang mechanism
> directly handles AIO errors no matter which type of devices it is. Is it a
> more common way to implement the feature in block-backend layer? Especially we
> can set retry timeout in a common structure BlockBackend.
> 
> Besides, is there any reason that QEMU implements the rerror=/werror mechansim
> in device layer rather than in block-backend layer?

Yes, it's because failed requests can be live-migrated and retried on
the destination host. In other words, live migration still works even
when there are failed requests.

There may be things that can be refactored so there is less duplication
in devices, but the basic design goal is that the block layer doesn't
keep track of failed requests because they are live migrated together
with the device state.

Maybe Kevin Wolf has more thoughts to share about rerror=/werror=.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-10-30 13:23 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 1/9] block-backend: introduce I/O rehandle info Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 2/9] block-backend: rehandle block aios when EIO Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 3/9] block-backend: add I/O hang timeout Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 4/9] block-backend: add I/O rehandle pause/unpause Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 5/9] block-backend: enable I/O hang when timeout is set Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 6/9] virtio-blk: pause I/O hang when resetting Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 7/9] qemu-option: add I/O hang timeout option Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 8/9] qapi: add I/O hang and I/O hang timeout qapi event Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 9/9] docs: add a doc about I/O hang Jiahui Cen
2020-10-26 16:53 ` [PATCH v3 0/9] block-backend: Introduce " Stefan Hajnoczi
2020-10-29  9:42   ` cenjiahui
2020-10-30 13:21     ` Stefan Hajnoczi [this message]
2020-11-03 12:19       ` cenjiahui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201030132153.GB320132@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=cenjiahui@huawei.com \
    --cc=fangying1@huawei.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).