kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Auger Eric <eric.auger@redhat.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	linux-pci@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Nianyao Tang <tangnianyao@huawei.com>,
	Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: Question on guest enable msi fail when using GICv4/4.1
Date: Mon, 10 May 2021 10:59:11 +0100	[thread overview]
Message-ID: <87y2cmoqog.wl-maz@kernel.org> (raw)
In-Reply-To: <69cd5989-f4cb-469c-f6a0-3362540e0271@redhat.com>

On Mon, 10 May 2021 09:29:47 +0100,
Auger Eric <eric.auger@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 5/10/21 9:49 AM, Marc Zyngier wrote:
> > Hi Eric,
> > 
> > On Sun, 09 May 2021 18:00:04 +0100,
> > Auger Eric <eric.auger@redhat.com> wrote:
> >>
> >> Hi,
> >> On 5/7/21 1:02 PM, Marc Zyngier wrote:
> >>> On Fri, 07 May 2021 10:58:23 +0100,
> >>> Shaokun Zhang <zhangshaokun@hisilicon.com> wrote:
> >>>>
> >>>> Hi Marc,
> >>>>
> >>>> Thanks for your quick reply.
> >>>>
> >>>> On 2021/5/7 17:03, Marc Zyngier wrote:
> >>>>> On Fri, 07 May 2021 06:57:04 +0100,
> >>>>> Shaokun Zhang <zhangshaokun@hisilicon.com> wrote:
> >>>>>>
> >>>>>> [This letter comes from Nianyao Tang]
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Using GICv4/4.1 and msi capability, guest vf driver requires 3
> >>>>>> vectors and enable msi, will lead to guest stuck.
> >>>>>
> >>>>> Stuck how?
> >>>>
> >>>> Guest serial does not response anymore and guest network shutdown.
> >>>>
> >>>>>
> >>>>>> Qemu gets number of interrupts from Multiple Message Capable field
> >>>>>> set by guest. This field is aligned to a power of 2(if a function
> >>>>>> requires 3 vectors, it initializes it to 2).
> >>>>>
> >>>>> So I guess this is a MultiMSI device with 4 vectors, right?
> >>>>>
> >>>>
> >>>> Yes, it can support maximum of 32 msi interrupts, and vf driver only use 3 msi.
> >>>>
> >>>>>> However, guest driver just sends 3 mapi-cmd to vits and 3 ite
> >>>>>> entries is recorded in host.  Vfio initializes msi interrupts using
> >>>>>> the number of interrupts 4 provide by qemu.  When it comes to the
> >>>>>> 4th msi without ite in vits, in irq_bypass_register_producer,
> >>>>>> producer and consumer will __connect fail, due to find_ite fail, and
> >>>>>> do not resume guest.
> >>>>>
> >>>>> Let me rephrase this to check that I understand it:
> >>>>> - The device has 4 vectors
> >>>>> - The guest only create mappings for 3 of them
> >>>>> - VFIO calls kvm_vgic_v4_set_forwarding() for each vector
> >>>>> - KVM doesn't have a mapping for the 4th vector and returns an error
> >>>>> - VFIO disable this 4th vector
> >>>>>
> >>>>> Is that correct? If yes, I don't understand why that impacts the guest
> >>>>> at all. From what I can see, vfio_msi_set_vector_signal() just prints
> >>>>> a message on the console and carries on.
> >>>>>
> >>>>
> >>>> function calls:
> >>>> --> vfio_msi_set_vector_signal
> >>>>    --> irq_bypass_register_producer
> >>>>       -->__connect
> >>>>
> >>>> in __connect, add_producer finally calls kvm_vgic_v4_set_forwarding
> >>>> and fails to get the 4th mapping. When add_producer fail, it does
> >>>> not call cons->start, calls kvm_arch_irq_bypass_start and then
> >>>> kvm_arm_resume_guest.
> >>>
> >>> [+Eric, who wrote the irq_bypass infrastructure.]
> >>>
> >>> Ah, so the guest is actually paused, not in a livelock situation
> >>> (which is how I interpreted "stuck").
> >>>
> >>> I think we should handle this case gracefully, as there should be no
> >>> expectation that the guest will be using this interrupt. Given that
> >>> VFIO seems to be pretty unfazed when a producer fails, I'm temped to
> >>> do the same thing and restart the guest.
> >>>
> >>> Also, __disconnect doesn't care about errors, so why should __connect
> >>> have this odd behaviour?
> >>
> >> _disconnect() does not care as we should always succeed tearing off
> >> things. del_* ops are void functions. On the opposite we can fail
> >> setting up the bypass.
> >>
> >> Effectively
> >> a979a6aa009f ("irqbypass: do not start cons/prod when failed connect")
> >> needs to be reverted.
> >>
> >> I agree the kerneldoc comments in linux/irqbypass.h may be improved to
> >> better explain the role of stop/start cbs and warn about their potential
> >> global impact.
> > 
> > Yup. It also begs the question of why we have producer callbacks, as
> > nobody seems to use them.
> 
> At the time this was designed, I was working on VFIO platform IRQ
> forwarding using direct EOI and they were used (and useful)
> 
> +	irq->producer.stop = vfio_platform_irq_bypass_stop;
> +	irq->producer.start = vfio_platform_irq_bypass_start;
> 
> [PATCH v4 02/13] VFIO: platform: registration of a dummy IRQ bypass producer
> [PATCH v4 07/13] VFIO: platform: add irq bypass producer management
> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-November/017323.html
> 
> basically the IRQ was disabled and re-enabled. This series has never
> been upstreamed but that's where it originates from.

Ah, I remember those. Something tells me we are eventually going to
need something like that with pKVM (unfortunately, PCI isn't that
popular on Android phones...).

> >> wrt the case above, "in __connect, add_producer finally calls
> >> kvm_vgic_v4_set_forwarding and fails to get the 4th mapping", shouldn't
> >> we succeed in that case?
> > 
> > From a KVM perspective, we can't return a success because there is no
> > guest LPI that matches the input signal.
> right, sorry I had in mind the set_forwarding was partially successful
> for 3 of 4 LPIs but it is a unitary operation.
> > 
> > And such failure seems to be expected by the VFIO code, which just
> > prints a message on the console and set the producer token to NULL. So
> > returning an error from the KVM code is useful, at least to an extent.
> 
> OK. So with the revert, the use case resume working, right?

Yes, Shaokun confirmed he was back in business.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

      reply	other threads:[~2021-05-10  9:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07  5:57 Shaokun Zhang
2021-05-07  9:03 ` Marc Zyngier
2021-05-07  9:58   ` Shaokun Zhang
2021-05-07 11:02     ` Marc Zyngier
     [not found]       ` <874kfepht4.wl-maz@kernel.org>
2021-05-08  1:51         ` Jason Wang
2021-05-08  6:56           ` Zhu, Lingshan
2021-05-08  9:15           ` Marc Zyngier
2021-05-09 17:00       ` Auger Eric
2021-05-10  7:49         ` Marc Zyngier
2021-05-10  8:29           ` Auger Eric
2021-05-10  9:59             ` Marc Zyngier [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2cmoqog.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=tangnianyao@huawei.com \
    --cc=zhangshaokun@hisilicon.com \
    --subject='Re: Question on guest enable msi fail when using GICv4/4.1' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).