From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=BAYES_00,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43CEBC433B4 for ; Fri, 7 May 2021 11:03:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 12D2E6145A for ; Fri, 7 May 2021 11:03:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233978AbhEGLEC (ORCPT ); Fri, 7 May 2021 07:04:02 -0400 Received: from mail.kernel.org ([198.145.29.99]:52340 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232796AbhEGLEB (ORCPT ); Fri, 7 May 2021 07:04:01 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A6C03613CD; Fri, 7 May 2021 11:03:01 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1leyG3-00BRue-78; Fri, 07 May 2021 12:02:59 +0100 Date: Fri, 07 May 2021 12:02:57 +0100 Message-ID: <878s4qq00u.wl-maz@kernel.org> From: Marc Zyngier To: Shaokun Zhang Cc: , , , , Alex Williamson , Cornelia Huck , Nianyao Tang , Bjorn Helgaas , Eric Auger Subject: Re: Question on guest enable msi fail when using GICv4/4.1 In-Reply-To: References: <3a2c66d6-6ca0-8478-d24b-61e8e3241b20@hisilicon.com> <87k0oaq5jf.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: zhangshaokun@hisilicon.com, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, linux-pci@vger.kernel.org, alex.williamson@redhat.com, cohuck@redhat.com, tangnianyao@huawei.com, bhelgaas@google.com, eric.auger@redhat.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Fri, 07 May 2021 10:58:23 +0100, Shaokun Zhang wrote: > > Hi Marc, > > Thanks for your quick reply. > > On 2021/5/7 17:03, Marc Zyngier wrote: > > On Fri, 07 May 2021 06:57:04 +0100, > > Shaokun Zhang wrote: > >> > >> [This letter comes from Nianyao Tang] > >> > >> Hi, > >> > >> Using GICv4/4.1 and msi capability, guest vf driver requires 3 > >> vectors and enable msi, will lead to guest stuck. > > > > Stuck how? > > Guest serial does not response anymore and guest network shutdown. > > > > >> Qemu gets number of interrupts from Multiple Message Capable field > >> set by guest. This field is aligned to a power of 2(if a function > >> requires 3 vectors, it initializes it to 2). > > > > So I guess this is a MultiMSI device with 4 vectors, right? > > > > Yes, it can support maximum of 32 msi interrupts, and vf driver only use 3 msi. > > >> However, guest driver just sends 3 mapi-cmd to vits and 3 ite > >> entries is recorded in host. Vfio initializes msi interrupts using > >> the number of interrupts 4 provide by qemu. When it comes to the > >> 4th msi without ite in vits, in irq_bypass_register_producer, > >> producer and consumer will __connect fail, due to find_ite fail, and > >> do not resume guest. > > > > Let me rephrase this to check that I understand it: > > - The device has 4 vectors > > - The guest only create mappings for 3 of them > > - VFIO calls kvm_vgic_v4_set_forwarding() for each vector > > - KVM doesn't have a mapping for the 4th vector and returns an error > > - VFIO disable this 4th vector > > > > Is that correct? If yes, I don't understand why that impacts the guest > > at all. From what I can see, vfio_msi_set_vector_signal() just prints > > a message on the console and carries on. > > > > function calls: > --> vfio_msi_set_vector_signal > --> irq_bypass_register_producer > -->__connect > > in __connect, add_producer finally calls kvm_vgic_v4_set_forwarding > and fails to get the 4th mapping. When add_producer fail, it does > not call cons->start, calls kvm_arch_irq_bypass_start and then > kvm_arm_resume_guest. [+Eric, who wrote the irq_bypass infrastructure.] Ah, so the guest is actually paused, not in a livelock situation (which is how I interpreted "stuck"). I think we should handle this case gracefully, as there should be no expectation that the guest will be using this interrupt. Given that VFIO seems to be pretty unfazed when a producer fails, I'm temped to do the same thing and restart the guest. Also, __disconnect doesn't care about errors, so why should __connect have this odd behaviour? Can you please try this? It is completely untested (and I think the del_consumer call is odd, which is why I've also dropped it). Eric, what do you think? Thanks, M. diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c index c9bb3957f58a..7e1865e15668 100644 --- a/virt/lib/irqbypass.c +++ b/virt/lib/irqbypass.c @@ -40,21 +40,14 @@ static int __connect(struct irq_bypass_producer *prod, if (prod->add_consumer) ret = prod->add_consumer(prod, cons); - if (ret) - goto err_add_consumer; - - ret = cons->add_producer(cons, prod); - if (ret) - goto err_add_producer; + if (!ret) + ret = cons->add_producer(cons, prod); if (cons->start) cons->start(cons); if (prod->start) prod->start(prod); -err_add_producer: - if (prod->del_consumer) - prod->del_consumer(prod, cons); -err_add_consumer: + return ret; } -- Without deviation from the norm, progress is not possible.