qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: cenjiahui <cenjiahui@huawei.com>
To: <kwolf@redhat.com>
Cc: zhang.zhanghailiang@huawei.com, qemu-block@nongnu.org,
	qemu-devel@nongnu.org, mreitz@redhat.com,
	Stefan Hajnoczi <stefanha@redhat.com>,
	fangying1@huawei.com, jsnow@redhat.com
Subject: Re: [PATCH v3 0/9] block-backend: Introduce I/O hang
Date: Tue, 3 Nov 2020 20:19:32 +0800	[thread overview]
Message-ID: <3b815863-3a39-073a-e871-44a5df3c9635@huawei.com> (raw)
In-Reply-To: <20201030132153.GB320132@stefanha-x1.localdomain>


On 2020/10/30 21:21, Stefan Hajnoczi wrote:
> On Thu, Oct 29, 2020 at 05:42:42PM +0800, cenjiahui wrote:
>>
>> On 2020/10/27 0:53, Stefan Hajnoczi wrote:
>>> On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
>>>> A VM in the cloud environment may use a virutal disk as the backend storage,
>>>> and there are usually filesystems on the virtual block device. When backend
>>>> storage is temporarily down, any I/O issued to the virtual block device will
>>>> cause an error. For example, an error occurred in ext4 filesystem would make
>>>> the filesystem readonly. However a cloud backend storage can be soon recovered.
>>>> For example, an IP-SAN may be down due to network failure and will be online
>>>> soon after network is recovered. The error in the filesystem may not be
>>>> recovered unless a device reattach or system restart. So an I/O rehandle is
>>>> in need to implement a self-healing mechanism.
>>>>
>>>> This patch series propose a feature called I/O hang. It can rehandle AIOs
>>>> with EIO error without sending error back to guest. From guest's perspective
>>>> of view it is just like an IO is hanging and not returned. Guest can get
>>>> back running smoothly when I/O is recovred with this feature enabled.
>>>
>>> Hi,
>>> This feature seems like an extension of the existing -drive
>>> rerror=/werror= parameters:
>>>
>>>   werror=action,rerror=action
>>>       Specify which action to take on write and read errors. Valid
>>>       actions are: "ignore" (ignore the error and try to continue),
>>>       "stop" (pause QEMU), "report" (report the error to the guest),
>>>       "enospc" (pause QEMU only if the host disk is full; report the
>>>       error to the guest otherwise).  The default setting is
>>>       werror=enospc and rerror=report.
>>>
>>> That mechanism already has a list of requests to retry and live
>>> migration integration. Using the werror=/rerror= mechanism would avoid
>>> code duplication between these features. You could add a
>>> werror/rerror=retry error action for this feature.
>>>
>>> Does that sound good?
>>>
>>> Stefan
>>>
>>
>> Hi Stefan,
>>
>> Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible
>> way for the retry feature.
>>
>> However, AFAIK, the rerror=/werror= mechanism in block-backend layer only
>> provides ACTION, and the real handler of errors need be implemented several
>> times in device layer for different devices. While our I/O Hang mechanism
>> directly handles AIO errors no matter which type of devices it is. Is it a
>> more common way to implement the feature in block-backend layer? Especially we
>> can set retry timeout in a common structure BlockBackend.
>>
>> Besides, is there any reason that QEMU implements the rerror=/werror mechansim
>> in device layer rather than in block-backend layer?
> 
> Yes, it's because failed requests can be live-migrated and retried on
> the destination host. In other words, live migration still works even
> when there are failed requests.
> 
> There may be things that can be refactored so there is less duplication
> in devices, but the basic design goal is that the block layer doesn't
> keep track of failed requests because they are live migrated together
> with the device state.
> 
> Maybe Kevin Wolf has more thoughts to share about rerror=/werror=.
> 
> Stefan
> 

Hi Kevin,

What do you think about extending rerror=/werror= for the retry feature?

And which place is better to set retry timeout, BlockBackend in
block layer or per device structure in device layer?

Jiahui


      reply	other threads:[~2020-11-03 12:21 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22 13:02 [PATCH v3 0/9] block-backend: Introduce I/O hang Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 1/9] block-backend: introduce I/O rehandle info Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 2/9] block-backend: rehandle block aios when EIO Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 3/9] block-backend: add I/O hang timeout Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 4/9] block-backend: add I/O rehandle pause/unpause Jiahui Cen
2020-10-22 13:02 ` [PATCH v3 5/9] block-backend: enable I/O hang when timeout is set Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 6/9] virtio-blk: pause I/O hang when resetting Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 7/9] qemu-option: add I/O hang timeout option Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 8/9] qapi: add I/O hang and I/O hang timeout qapi event Jiahui Cen
2020-10-22 13:03 ` [PATCH v3 9/9] docs: add a doc about I/O hang Jiahui Cen
2020-10-26 16:53 ` [PATCH v3 0/9] block-backend: Introduce " Stefan Hajnoczi
2020-10-29  9:42   ` cenjiahui
2020-10-30 13:21     ` Stefan Hajnoczi
2020-11-03 12:19       ` cenjiahui [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b815863-3a39-073a-e871-44a5df3c9635@huawei.com \
    --to=cenjiahui@huawei.com \
    --cc=fangying1@huawei.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).