From: Alex Williamson <alex.williamson@redhat.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: "Zhengxiao.zx@Alibaba-inc.com" <Zhengxiao.zx@Alibaba-inc.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
"cjia@nvidia.com" <cjia@nvidia.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"eskultet@redhat.com" <eskultet@redhat.com>,
"Yang, Ziye" <ziye.yang@intel.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"cohuck@redhat.com" <cohuck@redhat.com>,
"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
"dgilbert@redhat.com" <dgilbert@redhat.com>,
"Wang, Zhi A" <zhi.a.wang@intel.com>,
"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
"aik@ozlabs.ru" <aik@ozlabs.ru>,
"eauger@redhat.com" <eauger@redhat.com>,
"felipe@nutanix.com" <felipe@nutanix.com>,
"jonathan.davies@nutanix.com" <jonathan.davies@nutanix.com>,
Yan Zhao <yan.y.zhao@intel.com>,
"Liu, Changpeng" <changpeng.liu@intel.com>,
"Ken.Xue@amd.com" <Ken.Xue@amd.com>
Subject: Re: [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap.
Date: Wed, 4 Dec 2019 23:40:12 -0700 [thread overview]
Message-ID: <20191204234012.25f6159e@x1.home> (raw)
In-Reply-To: <77538a77-91ec-8a60-1f87-3fc18bd1cfe4@nvidia.com>
On Thu, 5 Dec 2019 11:49:00 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:
> On 12/5/2019 11:26 AM, Alex Williamson wrote:
> > On Thu, 5 Dec 2019 11:12:23 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >
> >> On 12/5/2019 6:58 AM, Yan Zhao wrote:
> >>> On Thu, Dec 05, 2019 at 02:34:57AM +0800, Alex Williamson wrote:
> >>>> On Wed, 4 Dec 2019 23:40:25 +0530
> >>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>>
> >>>>> On 12/3/2019 11:34 PM, Alex Williamson wrote:
> >>>>>> On Mon, 25 Nov 2019 19:57:39 -0500
> >>>>>> Yan Zhao <yan.y.zhao@intel.com> wrote:
> >>>>>>
> >>>>>>> On Fri, Nov 15, 2019 at 05:06:25AM +0800, Alex Williamson wrote:
> >>>>>>>> On Fri, 15 Nov 2019 00:26:07 +0530
> >>>>>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>>>>>>
> >>>>>>>>> On 11/14/2019 1:37 AM, Alex Williamson wrote:
> >>>>>>>>>> On Thu, 14 Nov 2019 01:07:21 +0530
> >>>>>>>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On 11/13/2019 4:00 AM, Alex Williamson wrote:
> >>>>>>>>>>>> On Tue, 12 Nov 2019 22:33:37 +0530
> >>>>>>>>>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> All pages pinned by vendor driver through vfio_pin_pages API should be
> >>>>>>>>>>>>> considered as dirty during migration. IOMMU container maintains a list of
> >>>>>>>>>>>>> all such pinned pages. Added an ioctl defination to get bitmap of such
> >>>>>>>>>>>>
> >>>>>>>>>>>> definition
> >>>>>>>>>>>>
> >>>>>>>>>>>>> pinned pages for requested IO virtual address range.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Additionally, all mapped pages are considered dirty when physically
> >>>>>>>>>>>> mapped through to an IOMMU, modulo we discussed devices opting in to
> >>>>>>>>>>>> per page pinning to indicate finer granularity with a TBD mechanism to
> >>>>>>>>>>>> figure out if any non-opt-in devices remain.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> You mean, in case of device direct assignment (device pass through)?
> >>>>>>>>>>
> >>>>>>>>>> Yes, or IOMMU backed mdevs. If vfio_dmas in the container are fully
> >>>>>>>>>> pinned and mapped, then the correct dirty page set is all mapped pages.
> >>>>>>>>>> We discussed using the vpfn list as a mechanism for vendor drivers to
> >>>>>>>>>> reduce their migration footprint, but we also discussed that we would
> >>>>>>>>>> need a way to determine that all participants in the container have
> >>>>>>>>>> explicitly pinned their working pages or else we must consider the
> >>>>>>>>>> entire potential working set as dirty.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> How can vendor driver tell this capability to iommu module? Any suggestions?
> >>>>>>>>
> >>>>>>>> I think it does so by pinning pages. Is it acceptable that if the
> >>>>>>>> vendor driver pins any pages, then from that point forward we consider
> >>>>>>>> the IOMMU group dirty page scope to be limited to pinned pages? There
> >>>>>>> we should also be aware of that dirty page scope is pinned pages + unpinned pages,
> >>>>>>> which means ever since a page is pinned, it should be regarded as dirty
> >>>>>>> no matter whether it's unpinned later. only after log_sync is called and
> >>>>>>> dirty info retrieved, its dirty state should be cleared.
> >>>>>>
> >>>>>> Yes, good point. We can't just remove a vpfn when a page is unpinned
> >>>>>> or else we'd lose information that the page potentially had been
> >>>>>> dirtied while it was pinned. Maybe that vpfn needs to move to a dirty
> >>>>>> list and both the currently pinned vpfns and the dirty vpfns are walked
> >>>>>> on a log_sync. The dirty vpfns list would be cleared after a log_sync.
> >>>>>> The container would need to know that dirty tracking is enabled and
> >>>>>> only manage the dirty vpfns list when necessary. Thanks,
> >>>>>>
> >>>>>
> >>>>> If page is unpinned, then that page is available in free page pool for
> >>>>> others to use, then how can we say that unpinned page has valid data?
> >>>>>
> >>>>> If suppose, one driver A unpins a page and when driver B of some other
> >>>>> device gets that page and he pins it, uses it, and then unpins it, then
> >>>>> how can we say that page has valid data for driver A?
> >>>>>
> >>>>> Can you give one example where unpinned page data is considered reliable
> >>>>> and valid?
> >>>>
> >>>> We can only pin pages that the user has already allocated* and mapped
> >>>> through the vfio DMA API. The pinning of the page simply locks the
> >>>> page for the vendor driver to access it and unpinning that page only
> >>>> indicates that access is complete. Pages are not freed when a vendor
> >>>> driver unpins them, they still exist and at this point we're now
> >>>> assuming the device dirtied the page while it was pinned. Thanks,
> >>>>
> >>>> Alex
> >>>>
> >>>> * An exception here is that the page might be demand allocated and the
> >>>> act of pinning the page could actually allocate the backing page for
> >>>> the user if they have not faulted the page to trigger that allocation
> >>>> previously. That page remains mapped for the user's virtual address
> >>>> space even after the unpinning though.
> >>>>
> >>>
> >>> Yes, I can give an example in GVT.
> >>> when a gem_object is allocated in guest, before submitting it to guest
> >>> vGPU, gfx cmds in its ring buffer need to be pinned into GGTT to get a
> >>> global graphics address for hardware access. At that time, we shadow
> >>> those cmds and pin pages through vfio pin_pages(), and submit the shadow
> >>> gem_object to physial hardware.
> >>> After guest driver thinks the submitted gem_object has completed hardware
> >>> DMA, it unnpinnd those pinned GGTT graphics memory addresses. Then in
> >>> host, we unpin the shadow pages through vfio unpin_pages.
> >>> But, at this point, guest driver is still free to access the gem_object
> >>> through vCPUs, and guest user space is probably still mapping an object
> >>> into the gem_object in guest driver.
> >>> So, missing the dirty page tracking for unpinned pages would cause
> >>> data inconsitency.
> >>>
> >>
> >> If pages are accessed by guest through vCPUs, then RAM module in QEMU
> >> will take care of tracking those pages as dirty.
> >>
> >> All unpinned pages might not be used, so tracking all unpinned pages
> >> during VM or application life time would also lead to tracking lots of
> >> stale pages, even though they are not being used. Increasing number of
> >> not needed pages could also lead to increasing migration data leading
> >> increase in migration downtime.
> >
> > We can't rely on the vCPU also dirtying a page, the overhead is
> > unavoidable. It doesn't matter if the migration is fast if it's
> > incorrect. We only need to track unpinned dirty pages while the
> > migration is active and the tracking is flushed on each log_sync
> > callback. Thanks,
> >
>
> From Yan's comment, pasted below, I thought, need to track all unpinned
> pages during application or VM's lifetime.
>
> > There we should also be aware of that dirty page scope is pinned
> pages > + unpinned pages, which means ever since a page is pinned, it
> should
> > be regarded as dirty no matter whether it's unpinned later.
>
> But if its about tracking pages which are unpinned "while the migration
> is active", then that set would be less, will do this change.
When we first start a pre-copy, all RAM (including anything previously
pinned) is dirty anyway. I believe the log_sync callback is only
intended to report pages dirtied since the migration was started, or
since the last log_sync callback. We then assume that currently pinned
pages are constantly dirty and previously pinned pages are dirty until
we've reported them through log_sync. Thanks,
Alex
next prev parent reply other threads:[~2019-12-05 6:42 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-12 17:03 [PATCH v9 Kernel 0/5] Add KABIs to support migration for VFIO devices Kirti Wankhede
2019-11-12 17:03 ` [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state Kirti Wankhede
2019-11-12 22:30 ` Alex Williamson
2019-11-13 3:23 ` Yan Zhao
2019-11-13 19:02 ` Kirti Wankhede
2019-11-14 0:36 ` Yan Zhao
2019-11-14 18:55 ` Kirti Wankhede
2019-11-13 10:24 ` Cornelia Huck
2019-11-13 18:27 ` Alex Williamson
2019-11-13 19:29 ` Kirti Wankhede
2019-11-13 19:48 ` Alex Williamson
2019-11-13 20:17 ` Kirti Wankhede
2019-11-13 20:40 ` Alex Williamson
2019-11-14 18:49 ` Kirti Wankhede
2019-11-12 17:03 ` [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap Kirti Wankhede
2019-11-12 22:30 ` Alex Williamson
2019-11-13 19:37 ` Kirti Wankhede
2019-11-13 20:07 ` Alex Williamson
2019-11-14 18:56 ` Kirti Wankhede
2019-11-14 21:06 ` Alex Williamson
2019-11-15 2:40 ` Yan Zhao
2019-11-15 3:21 ` Alex Williamson
2019-11-15 5:10 ` Tian, Kevin
2019-11-19 23:16 ` Alex Williamson
2019-11-20 1:04 ` Tian, Kevin
2019-11-20 1:51 ` Yan Zhao
2019-11-26 0:57 ` Yan Zhao
2019-12-03 18:04 ` Alex Williamson
2019-12-04 18:10 ` Kirti Wankhede
2019-12-04 18:34 ` Alex Williamson
2019-12-05 1:28 ` Yan Zhao
2019-12-05 5:42 ` Kirti Wankhede
2019-12-05 5:47 ` Yan Zhao
2019-12-05 5:56 ` Alex Williamson
2019-12-05 6:19 ` Kirti Wankhede
2019-12-05 6:40 ` Alex Williamson [this message]
2019-11-12 17:03 ` [PATCH v9 Kernel 3/5] vfio iommu: Add ioctl defination to unmap IOVA and return dirty bitmap Kirti Wankhede
2019-11-12 22:30 ` Alex Williamson
2019-11-13 19:52 ` Kirti Wankhede
2019-11-13 20:22 ` Alex Williamson
2019-11-14 18:56 ` Kirti Wankhede
2019-11-14 21:08 ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 4/5] vfio iommu: Implementation of ioctl to get dirty pages bitmap Kirti Wankhede
2019-11-12 22:30 ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 5/5] vfio iommu: Implementation of ioctl to get dirty bitmap before unmap Kirti Wankhede
2019-11-12 22:30 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191204234012.25f6159e@x1.home \
--to=alex.williamson@redhat.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@Alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=jonathan.davies@nutanix.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yan.y.zhao@intel.com \
--cc=yi.l.liu@intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).