From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0B6CC55178 for ; Thu, 29 Oct 2020 09:51:02 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 106432076D for ; Thu, 29 Oct 2020 09:51:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 106432076D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:52694 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kY4aD-0004aL-0f for qemu-devel@archiver.kernel.org; Thu, 29 Oct 2020 05:51:01 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33950) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kY4Sp-000314-3k; Thu, 29 Oct 2020 05:43:27 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:2117) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kY4Sb-00018g-9L; Thu, 29 Oct 2020 05:43:21 -0400 Received: from DGGEMS403-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4CML8c35Qwz703D; Thu, 29 Oct 2020 17:42:56 +0800 (CST) Received: from [10.174.184.155] (10.174.184.155) by DGGEMS403-HUB.china.huawei.com (10.3.19.203) with Microsoft SMTP Server id 14.3.487.0; Thu, 29 Oct 2020 17:42:42 +0800 Subject: Re: [PATCH v3 0/9] block-backend: Introduce I/O hang To: Stefan Hajnoczi References: <20201022130303.1092-1-cenjiahui@huawei.com> <20201026165341.GM52035@stefanha-x1.localdomain> From: cenjiahui Message-ID: Date: Thu, 29 Oct 2020 17:42:42 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <20201026165341.GM52035@stefanha-x1.localdomain> Content-Type: text/plain; charset="windows-1252" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.184.155] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.35; envelope-from=cenjiahui@huawei.com; helo=szxga07-in.huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/29 05:42:54 X-ACL-Warn: Detected OS = Linux 3.1-3.10 [fuzzy] X-Spam_score_int: -50 X-Spam_score: -5.1 X-Spam_bar: ----- X-Spam_report: (-5.1 / 5.0 requ) BAYES_00=-1.9, NICE_REPLY_A=-0.921, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, zhang.zhanghailiang@huawei.com, qemu-block@nongnu.org, qemu-devel@nongnu.org, mreitz@redhat.com, fangying1@huawei.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On 2020/10/27 0:53, Stefan Hajnoczi wrote: > On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote: >> A VM in the cloud environment may use a virutal disk as the backend storage, >> and there are usually filesystems on the virtual block device. When backend >> storage is temporarily down, any I/O issued to the virtual block device will >> cause an error. For example, an error occurred in ext4 filesystem would make >> the filesystem readonly. However a cloud backend storage can be soon recovered. >> For example, an IP-SAN may be down due to network failure and will be online >> soon after network is recovered. The error in the filesystem may not be >> recovered unless a device reattach or system restart. So an I/O rehandle is >> in need to implement a self-healing mechanism. >> >> This patch series propose a feature called I/O hang. It can rehandle AIOs >> with EIO error without sending error back to guest. From guest's perspective >> of view it is just like an IO is hanging and not returned. Guest can get >> back running smoothly when I/O is recovred with this feature enabled. > > Hi, > This feature seems like an extension of the existing -drive > rerror=/werror= parameters: > > werror=action,rerror=action > Specify which action to take on write and read errors. Valid > actions are: "ignore" (ignore the error and try to continue), > "stop" (pause QEMU), "report" (report the error to the guest), > "enospc" (pause QEMU only if the host disk is full; report the > error to the guest otherwise). The default setting is > werror=enospc and rerror=report. > > That mechanism already has a list of requests to retry and live > migration integration. Using the werror=/rerror= mechanism would avoid > code duplication between these features. You could add a > werror/rerror=retry error action for this feature. > > Does that sound good? > > Stefan > Hi Stefan, Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible way for the retry feature. However, AFAIK, the rerror=/werror= mechanism in block-backend layer only provides ACTION, and the real handler of errors need be implemented several times in device layer for different devices. While our I/O Hang mechanism directly handles AIO errors no matter which type of devices it is. Is it a more common way to implement the feature in block-backend layer? Especially we can set retry timeout in a common structure BlockBackend. Besides, is there any reason that QEMU implements the rerror=/werror mechansim in device layer rather than in block-backend layer? Jiahui