From: Kirti Wankhede <kwankhede@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com,
yi.l.liu@intel.com, cjia@nvidia.com, kvm@vger.kernel.org,
eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org,
cohuck@redhat.com, shuangtai.tst@alibaba-inc.com,
dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com,
pasic@linux.ibm.com, aik@ozlabs.ru, eauger@redhat.com,
felipe@nutanix.com, jonathan.davies@nutanix.com,
yan.y.zhao@intel.com, changpeng.liu@intel.com, Ken.Xue@amd.com
Subject: Re: [PATCH v11 Kernel 6/6] vfio: Selective dirty page tracking if IOMMU backed device pins pages
Date: Thu, 9 Jan 2020 02:22:26 +0530 [thread overview]
Message-ID: <17069da7-279b-872f-db15-d9995cf46285@nvidia.com> (raw)
In-Reply-To: <20200107170929.74c9c92e@w520.home>
On 1/8/2020 5:39 AM, Alex Williamson wrote:
> On Wed, 8 Jan 2020 02:15:01 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>
>> On 12/18/2019 5:42 AM, Alex Williamson wrote:
>>> On Tue, 17 Dec 2019 22:40:51 +0530
>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>>>
>>
>> <snip>
>>
>>>
>>> This will fail when there are devices within the IOMMU group that are
>>> not represented as vfio_devices. My original suggestion was:
>>>
>>> On Thu, 14 Nov 2019 14:06:25 -0700
>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>> I think it does so by pinning pages. Is it acceptable that if the
>>>> vendor driver pins any pages, then from that point forward we consider
>>>> the IOMMU group dirty page scope to be limited to pinned pages? There
>>>> are complications around non-singleton IOMMU groups, but I think we're
>>>> already leaning towards that being a non-worthwhile problem to solve.
>>>> So if we require that only singleton IOMMU groups can pin pages and we
>>>
>>> We could tag vfio_groups as singleton at vfio_add_group_dev() time with
>>> an iommu_group_for_each_dev() walk so that we can cache the value on
>>> the struct vfio_group.
>>
>> I don't think iommu_group_for_each_dev() is required. Checking
>> group->device_list in vfio_add_group_dev() if there are more than one
>> device should work, right?
>>
>> list_for_each_entry(vdev, &group->device_list, group_next) {
>> if (group->is_singleton) {
>> group->is_singleton = false;
>> break;
>> } else {
>> group->is_singleton = true;
>> }
>> }
>
> Hmm, I think you're taking a different approach to this than I was
> thinking. Re-reading my previous comments, the fact that both vfio.c
> and vfio_iommu_type1.c each have their own private struct vfio_group
> makes things rather unclear. I was intending to use the struct
> iommu_group as the object vfio.c provides to type1.c to associate the
> pinning. This would require that not only the vfio view of devices in
> the group to be singleton, but also the actual iommu group to be
> singleton. Otherwise the set of devices vfio.c has in the group might
> only be a subset of the group. Maybe a merger of the approaches is
> easier though.
>
> Tracking whether the vfio.c view of a group is singleton is even easier
> than above, we could simply add a device_count field to vfio_group,
> increment it in vfio_group_create_device() and decrement it in
> vfio_device_release(). vfio_pin_pages() could return error if
> device_count is not 1. We could still add the iommu_group pointer to
> the type1 pin_pages callback, but perhaps type1 simply assumes that the
> group is singleton when pin pages is called and it's vfio.c's
> responsibility to maintain that group as singleton once pages have been
> pinned. vfio.c would therefore also need to set a field on the
> vfio_group if pages have been pinned such that vfio_add_group_dev()
> could return error if a new device attempts to join the group. We'd
> need to make sure that field is cleared when the group is released from
> use and pay attention to races that might occur between adding devices
> to a group and pinning pages.
>
Thinking aloud, will adding singleton check could cause issues in near
future? - may be in future support for p2p and direct RDMA will be added
for mdev devices. In that case the two devices should be in same
iommu_domain, but should be in different iommu_group - is that
understanding correct?
>>> vfio_group_nb_add_dev() could update this if
>>> the IOMMU group composition changes.
>>
>> I don't see vfio_group_nb_add_dev() calls vfio_add_group_dev() (?)
>> If checking is_singleton is taken care in vfio_group_nb_add_dev(), which
>> is the only place where vfio_group is allocated, that should work, I think.
>
> This was relative to maintaining that the iommu group itself is
> singleton, not just the vfio view of the group. If we use the latter
> as our basis, then you're right, we should need this, but vfio.c would
> need to enforce that the group remains singleton if it has pinned
> pages. Does that make sense? Thanks,
>
Which route should be taken - iommu_group view or vfio.c group view?
Thanks,
Kirti
> Alex
>
>>> vfio_pin_pages() could return
>>> -EINVAL if (!group->is_singleton).
>>>
>>>> pass the IOMMU group as a parameter to
>>>> vfio_iommu_driver_ops.pin_pages(), then the type1 backend can set a
>>>> flag on its local vfio_group struct to indicate dirty page scope is
>>>> limited to pinned pages.
>>>
>>> ie. vfio_iommu_type1_unpin_pages() calls find_iommu_group() on each
>>> domain in domain_list and the external_domain using the struct
>>> iommu_group pointer provided by vfio-core. We set a new attribute on
>>> the vfio_group to indicate that vfio_group has (at some point) pinned
>>> pages.
>>>
>>>> We might want to keep a flag on the
>>>> vfio_iommu struct to indicate if all of the vfio_groups for each
>>>> vfio_domain in the vfio_iommu.domain_list dirty page scope limited to
>>>> pinned pages as an optimization to avoid walking lists too often. Then
>>>> we could test if vfio_iommu.domain_list is not empty and this new flag
>>>> does not limit the dirty page scope, then everything within each
>>>> vfio_dma is considered dirty.
>>>
>>> So at the point where we change vfio_group.has_pinned_pages from false
>>> to true, or a group is added or removed, we walk all the groups in the
>>> vfio_iommu and if they all have has_pinned_pages set, we can set a
>>> vfio_iommu.pinned_page_dirty_scope flag to true. If that flag is
>>> already true on page pinning, we can skip the lookup.
>>>
>>> I still like this approach better, it doesn't require a callback from
>>> type1 to vfio-core and it doesn't require a heavy weight walking for
>>> group devices and vfio data structures every time we fill a bitmap.
>>> Did you run into issues trying to implement this approach?
>>
>> Thanks for elaborative steps.
>> This works. Changing this last commit.
>>
>> Thanks,
>> Kirti
>>
>
next prev parent reply other threads:[~2020-01-08 20:53 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-17 17:10 [PATCH v11 Kernel 0/6] KABIs to support migration for VFIO devices Kirti Wankhede
2019-12-17 17:10 ` [PATCH v11 Kernel 1/6] vfio: KABI for migration interface for device state Kirti Wankhede
2019-12-17 17:10 ` [PATCH v11 Kernel 2/6] vfio iommu: Add ioctl definition for dirty pages tracking Kirti Wankhede
2019-12-17 17:10 ` [PATCH v11 Kernel 3/6] vfio iommu: Implementation of ioctl to " Kirti Wankhede
2019-12-17 22:12 ` Alex Williamson
2020-01-07 20:07 ` Kirti Wankhede
2020-01-07 22:02 ` Alex Williamson
2020-01-08 20:01 ` Kirti Wankhede
2020-01-08 22:29 ` Alex Williamson
2020-01-09 13:29 ` Kirti Wankhede
2020-01-09 14:53 ` Alex Williamson
2019-12-17 17:10 ` [PATCH v11 Kernel 4/6] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap Kirti Wankhede
2019-12-17 22:55 ` Alex Williamson
2019-12-17 17:10 ` [PATCH v11 Kernel 5/6] vfio iommu: Adds flag to indicate dirty pages tracking capability support Kirti Wankhede
2019-12-17 17:10 ` [PATCH v11 Kernel 6/6] vfio: Selective dirty page tracking if IOMMU backed device pins pages Kirti Wankhede
2019-12-18 0:12 ` Alex Williamson
2020-01-07 20:45 ` Kirti Wankhede
2020-01-08 0:09 ` Alex Williamson
2020-01-08 20:52 ` Kirti Wankhede [this message]
2020-01-08 22:59 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17069da7-279b-872f-db15-d9995cf46285@nvidia.com \
--to=kwankhede@nvidia.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@Alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=jonathan.davies@nutanix.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yan.y.zhao@intel.com \
--cc=yi.l.liu@intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).