From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7400AC433B4 for ; Sat, 8 May 2021 09:15:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 53BFA61377 for ; Sat, 8 May 2021 09:15:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230151AbhEHJQl convert rfc822-to-8bit (ORCPT ); Sat, 8 May 2021 05:16:41 -0400 Received: from mail.kernel.org ([198.145.29.99]:37364 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229583AbhEHJQk (ORCPT ); Sat, 8 May 2021 05:16:40 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D14EE6135D; Sat, 8 May 2021 09:15:39 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1lfJ3h-0005Lz-HK; Sat, 08 May 2021 10:15:37 +0100 Date: Sat, 08 May 2021 10:15:36 +0100 Message-ID: <874kfdvb5z.wl-maz@kernel.org> From: Marc Zyngier To: Jason Wang Cc: Zhu Lingshan , Shaokun Zhang , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, linux-pci@vger.kernel.org, Alex Williamson , Cornelia Huck , Nianyao Tang , Bjorn Helgaas , Eric Auger , "Michael S. Tsirkin" Subject: Re: Question on guest enable msi fail when using GICv4/4.1 In-Reply-To: <373c70d3-eda3-8e84-d138-2f90d4e55217@redhat.com> References: <3a2c66d6-6ca0-8478-d24b-61e8e3241b20@hisilicon.com> <87k0oaq5jf.wl-maz@kernel.org> <878s4qq00u.wl-maz@kernel.org> <874kfepht4.wl-maz@kernel.org> <373c70d3-eda3-8e84-d138-2f90d4e55217@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: jasowang@redhat.com, lingshan.zhu@intel.com, zhangshaokun@hisilicon.com, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, linux-pci@vger.kernel.org, alex.williamson@redhat.com, cohuck@redhat.com, tangnianyao@huawei.com, bhelgaas@google.com, eric.auger@redhat.com, mst@redhat.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Sat, 08 May 2021 02:51:39 +0100, Jason Wang wrote: > > > 在 2021/5/8 上午1:36, Marc Zyngier 写道: [...] > > Adding Zhu, Jason, MST to the party. It all seems to be caused by this > > commit: > > > > commit a979a6aa009f3c99689432e0cdb5402a4463fb88 > > Author: Zhu Lingshan > > Date: Fri Jul 31 14:55:33 2020 +0800 > > > > irqbypass: do not start cons/prod when failed connect > > If failed to connect, there is no need to start consumer > > nor > > producer. > > Signed-off-by: Zhu Lingshan > > Suggested-by: Jason Wang > > Link: https://lore.kernel.org/r/20200731065533.4144-7-lingshan.zhu@intel.com > > Signed-off-by: Michael S. Tsirkin > > > > > > Zhu, I'd really like to understand why you think it is OK not to > > restart consumer and producers when a connection has failed to be > > established between the two? > > > My bad, I didn't check ARM code but it's not easy to infer that the > cons->start/stop is not a per consumer specific operation but a global > one like VM halting/resuming. I don't disagree that it is a bit of an odd behaviour, and maybe we can eventually relax this. However, my rule of thumb for error handling is to try and put things back in the state you found them. It is also unfortunate that this same commit introduces an interesting bug by unconditionally calling del_producer(), even if the producer/consumer connection has succeeded. I guess it is a good thing that nobody seem to implement any of the producer callbacks. > > In the case of KVM/arm64, this results in the guest being forever > > suspended and never resumed. That's obviously not an acceptable > > regression, as there is a number of benign reasons for a connect to > > fail. > > > Let's revert this commit. Thanks, M. -- Without deviation from the norm, progress is not possible.