kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Marc Zyngier <maz@kernel.org>,
	Zhu Lingshan <lingshan.zhu@intel.com>,
	Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Nianyao Tang <tangnianyao@huawei.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Eric Auger <eric.auger@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: Question on guest enable msi fail when using GICv4/4.1
Date: Sat, 8 May 2021 09:51:39 +0800	[thread overview]
Message-ID: <373c70d3-eda3-8e84-d138-2f90d4e55217@redhat.com> (raw)
In-Reply-To: <874kfepht4.wl-maz@kernel.org>

在 2021/5/8 上午1:36, Marc Zyngier 写道:
> On Fri, 07 May 2021 12:02:57 +0100,
> Marc Zyngier <maz@kernel.org> wrote:
>> On Fri, 07 May 2021 10:58:23 +0100,
>> Shaokun Zhang <zhangshaokun@hisilicon.com> wrote:
>>> Hi Marc,
>>> Thanks for your quick reply.
>>> On 2021/5/7 17:03, Marc Zyngier wrote:
>>>> On Fri, 07 May 2021 06:57:04 +0100,
>>>> Shaokun Zhang <zhangshaokun@hisilicon.com> wrote:
>>>>> [This letter comes from Nianyao Tang]
>>>>> Hi,
>>>>> Using GICv4/4.1 and msi capability, guest vf driver requires 3
>>>>> vectors and enable msi, will lead to guest stuck.
>>>> Stuck how?
>>> Guest serial does not response anymore and guest network shutdown.
>>>>> Qemu gets number of interrupts from Multiple Message Capable field
>>>>> set by guest. This field is aligned to a power of 2(if a function
>>>>> requires 3 vectors, it initializes it to 2).
>>>> So I guess this is a MultiMSI device with 4 vectors, right?
>>> Yes, it can support maximum of 32 msi interrupts, and vf driver only use 3 msi.
>>>>> However, guest driver just sends 3 mapi-cmd to vits and 3 ite
>>>>> entries is recorded in host.  Vfio initializes msi interrupts using
>>>>> the number of interrupts 4 provide by qemu.  When it comes to the
>>>>> 4th msi without ite in vits, in irq_bypass_register_producer,
>>>>> producer and consumer will __connect fail, due to find_ite fail, and
>>>>> do not resume guest.
>>>> Let me rephrase this to check that I understand it:
>>>> - The device has 4 vectors
>>>> - The guest only create mappings for 3 of them
>>>> - VFIO calls kvm_vgic_v4_set_forwarding() for each vector
>>>> - KVM doesn't have a mapping for the 4th vector and returns an error
>>>> - VFIO disable this 4th vector
>>>> Is that correct? If yes, I don't understand why that impacts the guest
>>>> at all. From what I can see, vfio_msi_set_vector_signal() just prints
>>>> a message on the console and carries on.
>>> function calls:
>>> --> vfio_msi_set_vector_signal
>>>     --> irq_bypass_register_producer
>>>        -->__connect
>>> in __connect, add_producer finally calls kvm_vgic_v4_set_forwarding
>>> and fails to get the 4th mapping. When add_producer fail, it does
>>> not call cons->start, calls kvm_arch_irq_bypass_start and then
>>> kvm_arm_resume_guest.
>> [+Eric, who wrote the irq_bypass infrastructure.]
>> Ah, so the guest is actually paused, not in a livelock situation
>> (which is how I interpreted "stuck").
>> I think we should handle this case gracefully, as there should be no
>> expectation that the guest will be using this interrupt. Given that
>> VFIO seems to be pretty unfazed when a producer fails, I'm temped to
>> do the same thing and restart the guest.
>> Also, __disconnect doesn't care about errors, so why should __connect
>> have this odd behaviour?
>> Can you please try this? It is completely untested (and I think the
>> del_consumer call is odd, which is why I've also dropped it).
>> Eric, what do you think?
> Adding Zhu, Jason, MST to the party. It all seems to be caused by this
> commit:
> commit a979a6aa009f3c99689432e0cdb5402a4463fb88
> Author: Zhu Lingshan <lingshan.zhu@intel.com>
> Date:   Fri Jul 31 14:55:33 2020 +0800
>      irqbypass: do not start cons/prod when failed connect
>      If failed to connect, there is no need to start consumer nor
>      producer.
>      Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>      Suggested-by: Jason Wang <jasowang@redhat.com>
>      Link: https://lore.kernel.org/r/20200731065533.4144-7-lingshan.zhu@intel.com
>      Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Zhu, I'd really like to understand why you think it is OK not to
> restart consumer and producers when a connection has failed to be
> established between the two?

My bad, I didn't check ARM code but it's not easy to infer that the 
cons->start/stop is not a per consumer specific operation but a global 
one like VM halting/resuming.

> In the case of KVM/arm64, this results in the guest being forever
> suspended and never resumed. That's obviously not an acceptable
> regression, as there is a number of benign reasons for a connect to
> fail.

Let's revert this commit.


> Thanks,
> 	M.

  parent reply	other threads:[~2021-05-08  1:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07  5:57 Question on guest enable msi fail when using GICv4/4.1 Shaokun Zhang
2021-05-07  9:03 ` Marc Zyngier
2021-05-07  9:58   ` Shaokun Zhang
2021-05-07 11:02     ` Marc Zyngier
     [not found]       ` <874kfepht4.wl-maz@kernel.org>
2021-05-08  1:51         ` Jason Wang [this message]
2021-05-08  6:56           ` Zhu, Lingshan
2021-05-08  9:15           ` Marc Zyngier
2021-05-09 17:00       ` Auger Eric
2021-05-10  7:49         ` Marc Zyngier
2021-05-10  8:29           ` Auger Eric
2021-05-10  9:59             ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=373c70d3-eda3-8e84-d138-2f90d4e55217@redhat.com \
    --to=jasowang@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=lingshan.zhu@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mst@redhat.com \
    --cc=tangnianyao@huawei.com \
    --cc=zhangshaokun@hisilicon.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).