virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Si-Wei Liu <si-wei.liu@oracle.com>
To: Jason Wang <jasowang@redhat.com>
Cc: xuanzhuo@linux.alibaba.com, mst@redhat.com, gal@nvidia.com,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, eperezma@redhat.com
Subject: Re: [PATCH RFC 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
Date: Mon, 28 Aug 2023 16:46:12 -0700	[thread overview]
Message-ID: <3364adfd-1eb7-8bce-41f9-bfe5473f1f2e@oracle.com> (raw)
In-Reply-To: <CACGkMEtNZnGw+O2PkZkCgOd+NiU3kw7asYsGH3EkuOHd=GvCnA@mail.gmail.com>



On 8/22/2023 1:54 AM, Jason Wang wrote:
> On Thu, Aug 17, 2023 at 7:44 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 8/15/2023 6:48 PM, Jason Wang wrote:
>>> On Wed, Aug 16, 2023 at 6:31 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 8/14/2023 7:25 PM, Jason Wang wrote:
>>>>> On Tue, Aug 15, 2023 at 9:45 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>> ---
>>>>>>     drivers/vhost/vdpa.c             | 16 +++++++++++++++-
>>>>>>     include/uapi/linux/vhost_types.h |  2 ++
>>>>>>     2 files changed, 17 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>>>>>> index 62b0a01..75092a7 100644
>>>>>> --- a/drivers/vhost/vdpa.c
>>>>>> +++ b/drivers/vhost/vdpa.c
>>>>>> @@ -406,6 +406,14 @@ static bool vhost_vdpa_can_resume(const struct vhost_vdpa *v)
>>>>>>            return ops->resume;
>>>>>>     }
>>>>>>
>>>>>> +static bool vhost_vdpa_has_persistent_map(const struct vhost_vdpa *v)
>>>>>> +{
>>>>>> +       struct vdpa_device *vdpa = v->vdpa;
>>>>>> +       const struct vdpa_config_ops *ops = vdpa->config;
>>>>>> +
>>>>>> +       return (!ops->set_map && !ops->dma_map) || ops->reset_map;
>>>>> So this means the IOTLB/IOMMU mappings have already been decoupled
>>>>> from the vdpa reset.
>>>> Not in the sense of API, it' been coupled since day one from the
>>>> implementations of every on-chip IOMMU parent driver, namely mlx5_vdpa
>>>> and vdpa_sim. Because of that, later on the (improper) support for
>>>> virtio-vdpa, from commit 6f5312f80183 ("vdpa/mlx5: Add support for
>>>> running with virtio_vdpa") and 6c3d329e6486 ("vdpa_sim: get rid of DMA
>>>> ops") misused the .reset() op to realize 1:1 mapping, rendering strong
>>>> coupling between device reset and reset of iotlb mappings. This series
>>>> try to rectify that implementation deficiency, while keep userspace
>>>> continuing to work with older kernel behavior.
>>>>
>>>>>     So it should have been noticed by the userspace.
>>>> Yes, userspace had noticed this no-chip IOMMU discrepancy since day one
>>>> I suppose. Unfortunately there's already code in userspace with this
>>>> assumption in mind that proactively tears down and sets up iotlb mapping
>>>> around vdpa device reset...
>>>>> I guess we can just fix the simulator and mlx5 then we are fine?
>>>> Only IF we don't care about running new QEMU on older kernels with
>>>> flawed on-chip iommu behavior around reset. But that's a big IF...
>>> So what I meant is:
>>>
>>> Userspace doesn't know whether the vendor specific mappings (set_map)
>>> are required or not. And in the implementation of vhost_vdpa, if
>>> platform IOMMU is used, the mappings are decoupled from the reset. So
>>> if the Qemu works with parents with platform IOMMU it means Qemu can
>>> work if we just decouple vendor specific mappings from the parents
>>> that uses set_map.
>> I was aware of this, and if you may notice I don't even offer a way
>> backward to retain/emulate the flawed vhost-iotlb reset behavior for
>> older userspace - I consider it more of a bug in .set_map driver
>> implementation of its own rather than what the vhost-vdpa iotlb
>> abstraction wishes to expose to userspace in the first place.
> That's my understanding as well.
>
>> If you ever look into QEMU's vhost_vdpa_reset_status() function, you may
>> see memory_listener_unregister() will be called to evict all of the
>> existing iotlb mappings right after vhost_vdpa_reset_device() across
>> device reset, and later on at vhost_vdpa_dev_start(),
>> memory_listener_register() will set up all iotlb mappings again. In an
>> ideal world without this on-chip iommu deficiency QEMU should not have
>> to behave this way - this is what I mentioned earlier that userspace had
>> already noticed the discrepancy and it has to "proactively tear down and
>> set up iotlb mapping around vdpa device reset". Apparently from
>> functionality perspective this trick works completely fine with platform
>> IOMMU, however, it's sub-optimal in the performance perspective.
> Right.
>
>> We can't simply fix QEMU by moving this memory_listener_unregister()
>> call out of the reset path unconditionally, as we don't want to break
>> the already-functioning older kernel even though it's suboptimal in
>> performance.
> I'm not sure how things can be broken in this case?
Things won't be broken if we don't care about performance, for example 
reboot a large memory VM (translated to device reset internally) will 
freeze the guest and introduce extra reboot delay unnecessarily. If we 
want to fix the performance by remove memory_listener_unregister() 
unconditionally and we don't have such a flag to distinguish, we will 
break network connectivity entirely after reset - as all mappings are 
purged during reset on older parent driver.

>   Or why it is specific to parent with set_map.
As if without the .reset_map op and corresponding driver implementation 
(in correct way), there's no appropriate means for on-chip iommu parent 
driver to persist iotlb mappings across reset, isn't it? If the driver 
deliberately removes it from .reset, they don't support 1:1 DMA mapping 
for virtio-vdpa on the other hand, for instance.

>
>> Instead, to keep new QEMU continuing to work on top of the
>> existing or older kernels, QEMU has to check this IOTLB_PERSIST feature
>> flag to decide whether it is safe not to bother flushing and setting up
>> iotlb across reset. For the platform IOMMU case, vdpa parent driver
>> won't implement either the .set_map or .dma_map op, so it should be
>> covered in the vhost_vdpa_has_persistent_map() check I suppose.
> Just to make sure we are at the same page.
>
>  From the userspace point of view, the IOTLB persists and vhost-vDPA
> doesn't reset the IOTLB during vDPA reset. But we have are two levels
> of the coupling in other places:
>
> 1) Qemu level: memory listener is coupled with DRIVER_OK/reset
> 2) vDPA parent level: mlx5 build/destroy MR during DRIVER_OK/reset
>
> If I understand you correctly, since we've coupled in 1), Qemu can't
> be aware of whether the mapping is coupled with reset or not?
I suspect it had been already noticed by someone who wrote this QEMU 
code since day one, just that there's lack of a comment documenting it. 
Or any other reason why QEMU had to decouple it in the first place? It 
affects performance across the board for platform IOMMU vdpa providers 
as well.

>   If we
> simply decouple in 1), memory mappigns might be lost during vDPA rset.
I would tend to say 1) is an inadvertent artifact or side effect of 2), 
as I do not see memory listeners are used like this in other QEMU 
subsystems, e.g. vhost, vfio. Consider this coupling in 1) had been in 
the play since day one with neither advanced vDPA features like SVQ nor 
equivalent deficiency in platform IOMMU vdpa providers, it's suspicious 
that mlx5 build/destroy MR during reset was the curlprit then.

Regards,
-Siwei

>
> Thanks
>
>>
>> Thanks,
>> -Siwei
>>> Thanks
>>>
>>>> Regards,
>>>> -Siwei
>>>>> Thanks
>>>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2023-08-28 23:46 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-02 17:12 [PATCH 0/2] vdpa/mlx5: Fixes for ASID handling Dragos Tatulea via Virtualization
2023-08-02 17:12 ` [PATCH 1/2] vdpa/mlx5: Fix mr->initialized semantics Dragos Tatulea via Virtualization
2023-08-03  8:03   ` Jason Wang
2023-08-03 11:40     ` Dragos Tatulea via Virtualization
2023-08-08  2:57       ` Jason Wang
2023-08-08  7:24         ` Dragos Tatulea via Virtualization
2023-08-09  1:42           ` Jason Wang
2023-08-14 14:15             ` Dragos Tatulea via Virtualization
2023-08-15  1:28               ` Jason Wang
2023-08-03 17:57     ` Si-Wei Liu
2023-08-08  3:00       ` Jason Wang
2023-08-08 22:58         ` Si-Wei Liu
2023-08-09  6:52           ` Jason Wang
2023-08-10  0:40             ` Si-Wei Liu
2023-08-10  3:10               ` Jason Wang
2023-08-10 22:20                 ` Si-Wei Liu
2023-08-14  2:59                   ` Jason Wang
2023-08-15  1:43                     ` [PATCH RFC 0/4] vdpa: decouple reset of iotlb mapping from device reset Si-Wei Liu
2023-08-15  1:43                       ` [PATCH RFC 1/4] vdpa: introduce .reset_map operation callback Si-Wei Liu
2023-08-15  2:21                         ` Jason Wang
2023-08-15 19:49                           ` Si-Wei Liu
2023-08-16  1:55                             ` Jason Wang
2023-08-17  0:05                               ` Si-Wei Liu
     [not found]                                 ` <CAJaqyWeC=G7fbgvmyCicnuGLYD84G5+b37tVA1KqzrSHO_AGDw@mail.gmail.com>
2023-08-21 22:31                                   ` Si-Wei Liu
2023-08-15  1:43                       ` [PATCH RFC 2/4] vdpa/mlx5: implement .reset_map driver op Si-Wei Liu
2023-08-15  8:26                         ` Dragos Tatulea via Virtualization
2023-08-15 23:11                           ` Si-Wei Liu
2023-08-15  1:43                       ` [PATCH RFC 3/4] vhost-vdpa: should restore 1:1 dma mapping before detaching driver Si-Wei Liu
2023-08-15  2:32                         ` Jason Wang
2023-08-15 23:09                           ` Si-Wei Liu
2023-08-15  1:43                       ` [PATCH RFC 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit Si-Wei Liu
2023-08-15  2:25                         ` Jason Wang
2023-08-15 22:30                           ` Si-Wei Liu
2023-08-16  1:48                             ` Jason Wang
2023-08-16 23:43                               ` Si-Wei Liu
2023-08-22  8:54                                 ` Jason Wang
2023-08-28 23:46                                   ` Si-Wei Liu [this message]
2023-08-02 17:12 ` [PATCH 2/2] vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary Dragos Tatulea via Virtualization
2023-08-10  8:54 ` [PATCH 0/2] vdpa/mlx5: Fixes for ASID handling Michael S. Tsirkin
2023-08-10  8:59   ` Jason Wang
2023-08-10  9:04   ` Dragos Tatulea via Virtualization

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3364adfd-1eb7-8bce-41f9-bfe5473f1f2e@oracle.com \
    --to=si-wei.liu@oracle.com \
    --cc=eperezma@redhat.com \
    --cc=gal@nvidia.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).