From: "Tian, Kevin" <kevin.tian@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>,
"Zhao, Yan Y" <yan.y.zhao@intel.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>,
"cjia@nvidia.com" <cjia@nvidia.com>,
"Yang, Ziye" <ziye.yang@intel.com>,
"Liu, Changpeng" <changpeng.liu@intel.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
"eskultet@redhat.com" <eskultet@redhat.com>,
"cohuck@redhat.com" <cohuck@redhat.com>,
"dgilbert@redhat.com" <dgilbert@redhat.com>,
"jonathan.davies@nutanix.com" <jonathan.davies@nutanix.com>,
"eauger@redhat.com" <eauger@redhat.com>,
"aik@ozlabs.ru" <aik@ozlabs.ru>,
"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
"felipe@nutanix.com" <felipe@nutanix.com>,
"Zhengxiao.zx@Alibaba-inc.com" <Zhengxiao.zx@Alibaba-inc.com>,
"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
"Ken.Xue@amd.com" <Ken.Xue@amd.com>,
"Wang, Zhi A" <zhi.a.wang@intel.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: RE: [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap.
Date: Fri, 15 Nov 2019 05:10:53 +0000 [thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D19D5F6A83@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <20191114202133.4b046cb9@x1.home>
> From: Alex Williamson
> Sent: Friday, November 15, 2019 11:22 AM
>
> On Thu, 14 Nov 2019 21:40:35 -0500
> Yan Zhao <yan.y.zhao@intel.com> wrote:
>
> > On Fri, Nov 15, 2019 at 05:06:25AM +0800, Alex Williamson wrote:
> > > On Fri, 15 Nov 2019 00:26:07 +0530
> > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >
> > > > On 11/14/2019 1:37 AM, Alex Williamson wrote:
> > > > > On Thu, 14 Nov 2019 01:07:21 +0530
> > > > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > > >
> > > > >> On 11/13/2019 4:00 AM, Alex Williamson wrote:
> > > > >>> On Tue, 12 Nov 2019 22:33:37 +0530
> > > > >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > > >>>
> > > > >>>> All pages pinned by vendor driver through vfio_pin_pages API
> should be
> > > > >>>> considered as dirty during migration. IOMMU container
> maintains a list of
> > > > >>>> all such pinned pages. Added an ioctl defination to get bitmap of
> such
> > > > >>>
> > > > >>> definition
> > > > >>>
> > > > >>>> pinned pages for requested IO virtual address range.
> > > > >>>
> > > > >>> Additionally, all mapped pages are considered dirty when
> physically
> > > > >>> mapped through to an IOMMU, modulo we discussed devices
> opting in to
> > > > >>> per page pinning to indicate finer granularity with a TBD
> mechanism to
> > > > >>> figure out if any non-opt-in devices remain.
> > > > >>>
> > > > >>
> > > > >> You mean, in case of device direct assignment (device pass
> through)?
> > > > >
> > > > > Yes, or IOMMU backed mdevs. If vfio_dmas in the container are
> fully
> > > > > pinned and mapped, then the correct dirty page set is all mapped
> pages.
> > > > > We discussed using the vpfn list as a mechanism for vendor drivers
> to
> > > > > reduce their migration footprint, but we also discussed that we
> would
> > > > > need a way to determine that all participants in the container have
> > > > > explicitly pinned their working pages or else we must consider the
> > > > > entire potential working set as dirty.
> > > > >
> > > >
> > > > How can vendor driver tell this capability to iommu module? Any
> suggestions?
> > >
> > > I think it does so by pinning pages. Is it acceptable that if the
> > > vendor driver pins any pages, then from that point forward we consider
> > > the IOMMU group dirty page scope to be limited to pinned pages?
> There
> > > are complications around non-singleton IOMMU groups, but I think
> we're
> > > already leaning towards that being a non-worthwhile problem to solve.
> > > So if we require that only singleton IOMMU groups can pin pages and
> we
> > > pass the IOMMU group as a parameter to
> > > vfio_iommu_driver_ops.pin_pages(), then the type1 backend can set a
> > > flag on its local vfio_group struct to indicate dirty page scope is
> > > limited to pinned pages. We might want to keep a flag on the
> > > vfio_iommu struct to indicate if all of the vfio_groups for each
> > > vfio_domain in the vfio_iommu.domain_list dirty page scope limited to
> > > pinned pages as an optimization to avoid walking lists too often. Then
> > > we could test if vfio_iommu.domain_list is not empty and this new flag
> > > does not limit the dirty page scope, then everything within each
> > > vfio_dma is considered dirty.
> > >
> >
> > hi Alex
> > could you help clarify whether my understandings below are right?
> > In future,
> > 1. for mdev and for passthrough device withoug hardware ability to track
> > dirty pages, the vendor driver has to explicitly call
> > vfio_pin_pages()/vfio_unpin_pages() + a flag to tell vfio its dirty page set.
>
> For non-IOMMU backed mdevs without hardware dirty page tracking,
> there's no change to the vendor driver currently. Pages pinned by the
> vendor driver are marked as dirty.
What about the vendor driver can figure out, in software means, which
pinned pages are actually dirty? In that case, would a separate mark_dirty
interface make more sense? Or introduce read/write flag to the pin_pages
interface similar to DMA API? Existing drivers always set both r/w flags but
just in case then a specific driver may set read-only or write-only...
>
> For any IOMMU backed device, mdev or direct assignment, all mapped
> memory would be considered dirty unless there are explicit calls to pin
> pages on top of the IOMMU page pinning and mapping. These would likely
> be enabled only when the device is in the _SAVING device_state.
>
> > 2. for those devices with hardware ability to track dirty pages, will still
> > provide a callback to vendor driver to get dirty pages. (as for those
> devices,
> > it is hard to explicitly call vfio_pin_pages()/vfio_unpin_pages())
> >
> > 3. for devices relying on dirty bit info in physical IOMMU, there
> > will be a callback to physical IOMMU driver to get dirty page set from
> > vfio.
>
> The proposal here does not cover exactly how these would be
> implemented, it only establishes the container as the point of user
> interaction with the dirty bitmap and hopefully allows us to maintain
> that interface regardless of whether we have dirty tracking at the
> device or the system IOMMU. Ideally devices with dirty tracking would
> make use of page pinning and we'd extend the interface to allow vendor
> drivers the ability to indicate the clean/dirty state of those pinned
I don't think "dirty tracking" == "page pinning". It's possible that a device
support tracking/logging dirty page info into a driver-registered buffer,
then the host vendor driver doesn't need to mediate fast-path operations.
In such case, the entire guest memory is always pinned and we just need
a log-sync like interface for vendor driver to fill dirty bitmap.
> pages. For system IOMMU dirty page tracking, that potentially might
> mean that we support IOMMU page faults and the container manages
> those
> faults such that the container is the central record of dirty pages.
IOMMU dirty-bit is not equivalent to IOMMU page fault. The latter
is much more complex which requires support both in IOMMU and in
device. Here similar to above device-dirty-tracking case, we just need a
log-sync interface calling into iommu driver to get dirty info filled for
requested address range.
> Until these interfaces are designed, we can only speculate, but the
> goal is to design a user interface compatible with how those features
> might evolve. If you identify something that can't work, please raise
> the issue. Thanks,
>
> Alex
Here is the desired scheme in my mind. Feel free to correct me. :-)
1. iommu log-buf callback is preferred if underlying IOMMU reports
such capability. The iommu driver walks IOMMU page table to find
dirty pages for requested address range;
2. otherwise vendor driver log-buf callback is preferred if the vendor
driver reports such capability when registering mdev types. The
vendor driver calls device-specific interface to fill dirty info;
3. otherwise pages pined by vfio_pin_pages (with WRITE flag) are
considered dirty. This covers normal mediated devices or using
fast-path mediation for migrating passthrough device;
4. otherwise all mapped pages are considered dirty;
Currently we're working on 1) based on VT-d rev3.0. I know some
vendors implement 2) in their own code base. 3) has real usages
already. 4) is the fall-back.
Alex, are you willing to have all the interfaces ready in one batch,
or support them based on available usages? I'm fine with either
way, but even just doing 3/4 in this series, I'd prefer to having
above scheme included in the code comment, to give the whole
picture of all possible situations. :-)
Thanks
Kevin
next prev parent reply other threads:[~2019-11-15 5:10 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-12 17:03 [PATCH v9 Kernel 0/5] Add KABIs to support migration for VFIO devices Kirti Wankhede
2019-11-12 17:03 ` [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state Kirti Wankhede
2019-11-12 22:30 ` Alex Williamson
2019-11-13 3:23 ` Yan Zhao
2019-11-13 19:02 ` Kirti Wankhede
2019-11-14 0:36 ` Yan Zhao
2019-11-14 18:55 ` Kirti Wankhede
2019-11-13 10:24 ` Cornelia Huck
2019-11-13 18:27 ` Alex Williamson
2019-11-13 19:29 ` Kirti Wankhede
2019-11-13 19:48 ` Alex Williamson
2019-11-13 20:17 ` Kirti Wankhede
2019-11-13 20:40 ` Alex Williamson
2019-11-14 18:49 ` Kirti Wankhede
2019-11-12 17:03 ` [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap Kirti Wankhede
2019-11-12 22:30 ` Alex Williamson
2019-11-13 19:37 ` Kirti Wankhede
2019-11-13 20:07 ` Alex Williamson
2019-11-14 18:56 ` Kirti Wankhede
2019-11-14 21:06 ` Alex Williamson
2019-11-15 2:40 ` Yan Zhao
2019-11-15 3:21 ` Alex Williamson
2019-11-15 5:10 ` Tian, Kevin [this message]
2019-11-19 23:16 ` Alex Williamson
2019-11-20 1:04 ` Tian, Kevin
2019-11-20 1:51 ` Yan Zhao
2019-11-26 0:57 ` Yan Zhao
2019-12-03 18:04 ` Alex Williamson
2019-12-04 18:10 ` Kirti Wankhede
2019-12-04 18:34 ` Alex Williamson
2019-12-05 1:28 ` Yan Zhao
2019-12-05 5:42 ` Kirti Wankhede
2019-12-05 5:47 ` Yan Zhao
2019-12-05 5:56 ` Alex Williamson
2019-12-05 6:19 ` Kirti Wankhede
2019-12-05 6:40 ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 3/5] vfio iommu: Add ioctl defination to unmap IOVA and return dirty bitmap Kirti Wankhede
2019-11-12 22:30 ` Alex Williamson
2019-11-13 19:52 ` Kirti Wankhede
2019-11-13 20:22 ` Alex Williamson
2019-11-14 18:56 ` Kirti Wankhede
2019-11-14 21:08 ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 4/5] vfio iommu: Implementation of ioctl to get dirty pages bitmap Kirti Wankhede
2019-11-12 22:30 ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 5/5] vfio iommu: Implementation of ioctl to get dirty bitmap before unmap Kirti Wankhede
2019-11-12 22:30 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AADFC41AFE54684AB9EE6CBC0274A5D19D5F6A83@SHSMSX104.ccr.corp.intel.com \
--to=kevin.tian@intel.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@Alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=jonathan.davies@nutanix.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yan.y.zhao@intel.com \
--cc=yi.l.liu@intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).