From: Auger Eric <eric.auger@redhat.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>,
kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
linux-pci@vger.kernel.org,
Alex Williamson <alex.williamson@redhat.com>,
Cornelia Huck <cohuck@redhat.com>,
Nianyao Tang <tangnianyao@huawei.com>,
Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: Question on guest enable msi fail when using GICv4/4.1
Date: Mon, 10 May 2021 10:29:47 +0200 [thread overview]
Message-ID: <69cd5989-f4cb-469c-f6a0-3362540e0271@redhat.com> (raw)
In-Reply-To: <871rafowp2.wl-maz@kernel.org>
Hi Marc,
On 5/10/21 9:49 AM, Marc Zyngier wrote:
> Hi Eric,
>
> On Sun, 09 May 2021 18:00:04 +0100,
> Auger Eric <eric.auger@redhat.com> wrote:
>>
>> Hi,
>> On 5/7/21 1:02 PM, Marc Zyngier wrote:
>>> On Fri, 07 May 2021 10:58:23 +0100,
>>> Shaokun Zhang <zhangshaokun@hisilicon.com> wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> Thanks for your quick reply.
>>>>
>>>> On 2021/5/7 17:03, Marc Zyngier wrote:
>>>>> On Fri, 07 May 2021 06:57:04 +0100,
>>>>> Shaokun Zhang <zhangshaokun@hisilicon.com> wrote:
>>>>>>
>>>>>> [This letter comes from Nianyao Tang]
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Using GICv4/4.1 and msi capability, guest vf driver requires 3
>>>>>> vectors and enable msi, will lead to guest stuck.
>>>>>
>>>>> Stuck how?
>>>>
>>>> Guest serial does not response anymore and guest network shutdown.
>>>>
>>>>>
>>>>>> Qemu gets number of interrupts from Multiple Message Capable field
>>>>>> set by guest. This field is aligned to a power of 2(if a function
>>>>>> requires 3 vectors, it initializes it to 2).
>>>>>
>>>>> So I guess this is a MultiMSI device with 4 vectors, right?
>>>>>
>>>>
>>>> Yes, it can support maximum of 32 msi interrupts, and vf driver only use 3 msi.
>>>>
>>>>>> However, guest driver just sends 3 mapi-cmd to vits and 3 ite
>>>>>> entries is recorded in host. Vfio initializes msi interrupts using
>>>>>> the number of interrupts 4 provide by qemu. When it comes to the
>>>>>> 4th msi without ite in vits, in irq_bypass_register_producer,
>>>>>> producer and consumer will __connect fail, due to find_ite fail, and
>>>>>> do not resume guest.
>>>>>
>>>>> Let me rephrase this to check that I understand it:
>>>>> - The device has 4 vectors
>>>>> - The guest only create mappings for 3 of them
>>>>> - VFIO calls kvm_vgic_v4_set_forwarding() for each vector
>>>>> - KVM doesn't have a mapping for the 4th vector and returns an error
>>>>> - VFIO disable this 4th vector
>>>>>
>>>>> Is that correct? If yes, I don't understand why that impacts the guest
>>>>> at all. From what I can see, vfio_msi_set_vector_signal() just prints
>>>>> a message on the console and carries on.
>>>>>
>>>>
>>>> function calls:
>>>> --> vfio_msi_set_vector_signal
>>>> --> irq_bypass_register_producer
>>>> -->__connect
>>>>
>>>> in __connect, add_producer finally calls kvm_vgic_v4_set_forwarding
>>>> and fails to get the 4th mapping. When add_producer fail, it does
>>>> not call cons->start, calls kvm_arch_irq_bypass_start and then
>>>> kvm_arm_resume_guest.
>>>
>>> [+Eric, who wrote the irq_bypass infrastructure.]
>>>
>>> Ah, so the guest is actually paused, not in a livelock situation
>>> (which is how I interpreted "stuck").
>>>
>>> I think we should handle this case gracefully, as there should be no
>>> expectation that the guest will be using this interrupt. Given that
>>> VFIO seems to be pretty unfazed when a producer fails, I'm temped to
>>> do the same thing and restart the guest.
>>>
>>> Also, __disconnect doesn't care about errors, so why should __connect
>>> have this odd behaviour?
>>
>> _disconnect() does not care as we should always succeed tearing off
>> things. del_* ops are void functions. On the opposite we can fail
>> setting up the bypass.
>>
>> Effectively
>> a979a6aa009f ("irqbypass: do not start cons/prod when failed connect")
>> needs to be reverted.
>>
>> I agree the kerneldoc comments in linux/irqbypass.h may be improved to
>> better explain the role of stop/start cbs and warn about their potential
>> global impact.
>
> Yup. It also begs the question of why we have producer callbacks, as
> nobody seems to use them.
At the time this was designed, I was working on VFIO platform IRQ
forwarding using direct EOI and they were used (and useful)
+ irq->producer.stop = vfio_platform_irq_bypass_stop;
+ irq->producer.start = vfio_platform_irq_bypass_start;
[PATCH v4 02/13] VFIO: platform: registration of a dummy IRQ bypass producer
[PATCH v4 07/13] VFIO: platform: add irq bypass producer management
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-November/017323.html
basically the IRQ was disabled and re-enabled. This series has never
been upstreamed but that's where it originates from.
>
>> wrt the case above, "in __connect, add_producer finally calls
>> kvm_vgic_v4_set_forwarding and fails to get the 4th mapping", shouldn't
>> we succeed in that case?
>
> From a KVM perspective, we can't return a success because there is no
> guest LPI that matches the input signal.
right, sorry I had in mind the set_forwarding was partially successful
for 3 of 4 LPIs but it is a unitary operation.
>
> And such failure seems to be expected by the VFIO code, which just
> prints a message on the console and set the producer token to NULL. So
> returning an error from the KVM code is useful, at least to an extent.
OK. So with the revert, the use case resume working, right?
Thanks
Eric
>
> Thanks,
>
> M.
>
next prev parent reply other threads:[~2021-05-10 8:30 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-07 5:57 Question on guest enable msi fail when using GICv4/4.1 Shaokun Zhang
2021-05-07 9:03 ` Marc Zyngier
2021-05-07 9:58 ` Shaokun Zhang
2021-05-07 11:02 ` Marc Zyngier
[not found] ` <874kfepht4.wl-maz@kernel.org>
2021-05-08 1:51 ` Jason Wang
2021-05-08 6:56 ` Zhu, Lingshan
2021-05-08 9:15 ` Marc Zyngier
2021-05-09 17:00 ` Auger Eric
2021-05-10 7:49 ` Marc Zyngier
2021-05-10 8:29 ` Auger Eric [this message]
2021-05-10 9:59 ` Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69cd5989-f4cb-469c-f6a0-3362540e0271@redhat.com \
--to=eric.auger@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=cohuck@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=maz@kernel.org \
--cc=tangnianyao@huawei.com \
--cc=zhangshaokun@hisilicon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).