From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934285AbdDFJeM (ORCPT ); Thu, 6 Apr 2017 05:34:12 -0400 Received: from cn.fujitsu.com ([59.151.112.132]:58449 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1754291AbdDFIkU (ORCPT ); Thu, 6 Apr 2017 04:40:20 -0400 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="17425377" Subject: Re: [PATCH v6] vfio error recovery: kernel support To: "Michael S. Tsirkin" References: <1490260051-6046-1-git-send-email-caoj.fnst@cn.fujitsu.com> <20170324161238.366ce6a7@t450s.home> <58DA6954.2000601@cn.fujitsu.com> <20170328101233.74f50a92@t450s.home> <20170329000148.GA18849@redhat.com> <20170328205513.21b97381@t450s.home> <20170330205823-mutt-send-email-mst@kernel.org> <20170330121652.2ac8fa62@t450s.home> <58E4B0C9.50109@cn.fujitsu.com> <20170406005028-mutt-send-email-mst@kernel.org> CC: Alex Williamson , , , , From: Cao jin Message-ID: <58E6011F.6030002@cn.fujitsu.com> Date: Thu, 6 Apr 2017 16:49:35 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20170406005028-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.167.226.69] X-yoursite-MailScanner-ID: 97E6346701F2.A5AD8 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: caoj.fnst@cn.fujitsu.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/06/2017 05:56 AM, Michael S. Tsirkin wrote: > On Wed, Apr 05, 2017 at 04:54:33PM +0800, Cao jin wrote: >> Apparently, I don't have experience to induce non-fatal error, device >> error is more of a chance related with the environment(temperature, >> humidity, etc) as I understand. > > I'm not sure how to interpret this statement. I think what Alex is > saying is simply that patches should include some justification. They > make changes but what are they improving? > For example: > > I tested device ABC in conditions DEF. Without a patch VM > stops. With the patches applied VM recovers and proceeds to > use the device normally. > > is one reasonable justification imho. > Got it. But unfortunately, until now, I haven't seen a VM stop caused by a real device non-fatal error during device assignment(Only saw real fatal errors after start VM). On one side, AER error could occur theoretically; on the other side, seldom people have seen a VM stop caused by AER. Now I am asked that do I have a real evidence or scenario to prove that this patchset is really useful? I don't, and we all know it is hard to trigger a real hardware error, so, seems I am pushed into the corner. I guess these questions also apply for AER driver's author, if the scenario is easy to reproduce, there is no need to write aer_inject to fake errors. -- Sincerely, Cao jin From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60868) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cw2xk-00065l-PT for qemu-devel@nongnu.org; Thu, 06 Apr 2017 04:40:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cw2xj-00025S-O0 for qemu-devel@nongnu.org; Thu, 06 Apr 2017 04:40:16 -0400 Received: from [59.151.112.132] (port=38342 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cw2xj-00024m-C8 for qemu-devel@nongnu.org; Thu, 06 Apr 2017 04:40:15 -0400 References: <1490260051-6046-1-git-send-email-caoj.fnst@cn.fujitsu.com> <20170324161238.366ce6a7@t450s.home> <58DA6954.2000601@cn.fujitsu.com> <20170328101233.74f50a92@t450s.home> <20170329000148.GA18849@redhat.com> <20170328205513.21b97381@t450s.home> <20170330205823-mutt-send-email-mst@kernel.org> <20170330121652.2ac8fa62@t450s.home> <58E4B0C9.50109@cn.fujitsu.com> <20170406005028-mutt-send-email-mst@kernel.org> From: Cao jin Message-ID: <58E6011F.6030002@cn.fujitsu.com> Date: Thu, 6 Apr 2017 16:49:35 +0800 MIME-Version: 1.0 In-Reply-To: <20170406005028-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v6] vfio error recovery: kernel support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Alex Williamson , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, qemu-devel@nongnu.org, izumi.taku@jp.fujitsu.com On 04/06/2017 05:56 AM, Michael S. Tsirkin wrote: > On Wed, Apr 05, 2017 at 04:54:33PM +0800, Cao jin wrote: >> Apparently, I don't have experience to induce non-fatal error, device >> error is more of a chance related with the environment(temperature, >> humidity, etc) as I understand. > > I'm not sure how to interpret this statement. I think what Alex is > saying is simply that patches should include some justification. They > make changes but what are they improving? > For example: > > I tested device ABC in conditions DEF. Without a patch VM > stops. With the patches applied VM recovers and proceeds to > use the device normally. > > is one reasonable justification imho. > Got it. But unfortunately, until now, I haven't seen a VM stop caused by a real device non-fatal error during device assignment(Only saw real fatal errors after start VM). On one side, AER error could occur theoretically; on the other side, seldom people have seen a VM stop caused by AER. Now I am asked that do I have a real evidence or scenario to prove that this patchset is really useful? I don't, and we all know it is hard to trigger a real hardware error, so, seems I am pushed into the corner. I guess these questions also apply for AER driver's author, if the scenario is easy to reproduce, there is no need to write aer_inject to fake errors. -- Sincerely, Cao jin