KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Yan Zhao <yan.y.zhao@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>,
	"cjia@nvidia.com" <cjia@nvidia.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Yang, Ziye" <ziye.yang@intel.com>,
	"Liu, Changpeng" <changpeng.liu@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
	"eskultet@redhat.com" <eskultet@redhat.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"jonathan.davies@nutanix.com" <jonathan.davies@nutanix.com>,
	"eauger@redhat.com" <eauger@redhat.com>,
	"aik@ozlabs.ru" <aik@ozlabs.ru>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"felipe@nutanix.com" <felipe@nutanix.com>,
	"Zhengxiao.zx@Alibaba-inc.com" <Zhengxiao.zx@Alibaba-inc.com>,
	"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
	"Ken.Xue@amd.com" <Ken.Xue@amd.com>,
	"Wang, Zhi A" <zhi.a.wang@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap.
Date: Wed, 4 Dec 2019 20:28:35 -0500
Message-ID: <20191205012835.GB31791@joy-OptiPlex-7040> (raw)
In-Reply-To: <20191204113457.16c1316d@x1.home>

On Thu, Dec 05, 2019 at 02:34:57AM +0800, Alex Williamson wrote:
> On Wed, 4 Dec 2019 23:40:25 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
> > On 12/3/2019 11:34 PM, Alex Williamson wrote:
> > > On Mon, 25 Nov 2019 19:57:39 -0500
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > >> On Fri, Nov 15, 2019 at 05:06:25AM +0800, Alex Williamson wrote:  
> > >>> On Fri, 15 Nov 2019 00:26:07 +0530
> > >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >>>      
> > >>>> On 11/14/2019 1:37 AM, Alex Williamson wrote:  
> > >>>>> On Thu, 14 Nov 2019 01:07:21 +0530
> > >>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >>>>>        
> > >>>>>> On 11/13/2019 4:00 AM, Alex Williamson wrote:  
> > >>>>>>> On Tue, 12 Nov 2019 22:33:37 +0530
> > >>>>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >>>>>>>           
> > >>>>>>>> All pages pinned by vendor driver through vfio_pin_pages API should be
> > >>>>>>>> considered as dirty during migration. IOMMU container maintains a list of
> > >>>>>>>> all such pinned pages. Added an ioctl defination to get bitmap of such  
> > >>>>>>>
> > >>>>>>> definition
> > >>>>>>>           
> > >>>>>>>> pinned pages for requested IO virtual address range.  
> > >>>>>>>
> > >>>>>>> Additionally, all mapped pages are considered dirty when physically
> > >>>>>>> mapped through to an IOMMU, modulo we discussed devices opting in to
> > >>>>>>> per page pinning to indicate finer granularity with a TBD mechanism to
> > >>>>>>> figure out if any non-opt-in devices remain.
> > >>>>>>>           
> > >>>>>>
> > >>>>>> You mean, in case of device direct assignment (device pass through)?  
> > >>>>>
> > >>>>> Yes, or IOMMU backed mdevs.  If vfio_dmas in the container are fully
> > >>>>> pinned and mapped, then the correct dirty page set is all mapped pages.
> > >>>>> We discussed using the vpfn list as a mechanism for vendor drivers to
> > >>>>> reduce their migration footprint, but we also discussed that we would
> > >>>>> need a way to determine that all participants in the container have
> > >>>>> explicitly pinned their working pages or else we must consider the
> > >>>>> entire potential working set as dirty.
> > >>>>>        
> > >>>>
> > >>>> How can vendor driver tell this capability to iommu module? Any suggestions?  
> > >>>
> > >>> I think it does so by pinning pages.  Is it acceptable that if the
> > >>> vendor driver pins any pages, then from that point forward we consider
> > >>> the IOMMU group dirty page scope to be limited to pinned pages?  There  
> > >> we should also be aware of that dirty page scope is pinned pages + unpinned pages,
> > >> which means ever since a page is pinned, it should be regarded as dirty
> > >> no matter whether it's unpinned later. only after log_sync is called and
> > >> dirty info retrieved, its dirty state should be cleared.  
> > > 
> > > Yes, good point.  We can't just remove a vpfn when a page is unpinned
> > > or else we'd lose information that the page potentially had been
> > > dirtied while it was pinned.  Maybe that vpfn needs to move to a dirty
> > > list and both the currently pinned vpfns and the dirty vpfns are walked
> > > on a log_sync.  The dirty vpfns list would be cleared after a log_sync.
> > > The container would need to know that dirty tracking is enabled and
> > > only manage the dirty vpfns list when necessary.  Thanks,
> > >   
> > 
> > If page is unpinned, then that page is available in free page pool for 
> > others to use, then how can we say that unpinned page has valid data?
> > 
> > If suppose, one driver A unpins a page and when driver B of some other 
> > device gets that page and he pins it, uses it, and then unpins it, then 
> > how can we say that page has valid data for driver A?
> > 
> > Can you give one example where unpinned page data is considered reliable 
> > and valid?
> 
> We can only pin pages that the user has already allocated* and mapped
> through the vfio DMA API.  The pinning of the page simply locks the
> page for the vendor driver to access it and unpinning that page only
> indicates that access is complete.  Pages are not freed when a vendor
> driver unpins them, they still exist and at this point we're now
> assuming the device dirtied the page while it was pinned.  Thanks,
> 
> Alex
> 
> * An exception here is that the page might be demand allocated and the
>   act of pinning the page could actually allocate the backing page for
>   the user if they have not faulted the page to trigger that allocation
>   previously.  That page remains mapped for the user's virtual address
>   space even after the unpinning though.
>

Yes, I can give an example in GVT.
when a gem_object is allocated in guest, before submitting it to guest
vGPU, gfx cmds in its ring buffer need to be pinned into GGTT to get a
global graphics address for hardware access. At that time, we shadow
those cmds and pin pages through vfio pin_pages(), and submit the shadow
gem_object to physial hardware.
After guest driver thinks the submitted gem_object has completed hardware
DMA, it unnpinnd those pinned GGTT graphics memory addresses. Then in
host, we unpin the shadow pages through vfio unpin_pages.
But, at this point, guest driver is still free to access the gem_object
through vCPUs, and guest user space is probably still mapping an object
into the gem_object in guest driver.
So, missing the dirty page tracking for unpinned pages would cause
data inconsitency.

Thanks
Yan

  reply index

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-12 17:03 [PATCH v9 Kernel 0/5] Add KABIs to support migration for VFIO devices Kirti Wankhede
2019-11-12 17:03 ` [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-13  3:23     ` Yan Zhao
2019-11-13 19:02       ` Kirti Wankhede
2019-11-14  0:36         ` Yan Zhao
2019-11-14 18:55           ` Kirti Wankhede
2019-11-13 10:24     ` Cornelia Huck
2019-11-13 18:27       ` Alex Williamson
2019-11-13 19:29         ` Kirti Wankhede
2019-11-13 19:48           ` Alex Williamson
2019-11-13 20:17             ` Kirti Wankhede
2019-11-13 20:40               ` Alex Williamson
2019-11-14 18:49                 ` Kirti Wankhede
2019-11-12 17:03 ` [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-13 19:37     ` Kirti Wankhede
2019-11-13 20:07       ` Alex Williamson
2019-11-14 18:56         ` Kirti Wankhede
2019-11-14 21:06           ` Alex Williamson
2019-11-15  2:40             ` Yan Zhao
2019-11-15  3:21               ` Alex Williamson
2019-11-15  5:10                 ` Tian, Kevin
2019-11-19 23:16                   ` Alex Williamson
2019-11-20  1:04                     ` Tian, Kevin
2019-11-20  1:51                 ` Yan Zhao
2019-11-26  0:57             ` Yan Zhao
2019-12-03 18:04               ` Alex Williamson
2019-12-04 18:10                 ` Kirti Wankhede
2019-12-04 18:34                   ` Alex Williamson
2019-12-05  1:28                     ` Yan Zhao [this message]
2019-12-05  5:42                       ` Kirti Wankhede
2019-12-05  5:47                         ` Yan Zhao
2019-12-05  5:56                         ` Alex Williamson
2019-12-05  6:19                           ` Kirti Wankhede
2019-12-05  6:40                             ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 3/5] vfio iommu: Add ioctl defination to unmap IOVA and return dirty bitmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-13 19:52     ` Kirti Wankhede
2019-11-13 20:22       ` Alex Williamson
2019-11-14 18:56         ` Kirti Wankhede
2019-11-14 21:08           ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 4/5] vfio iommu: Implementation of ioctl to get dirty pages bitmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 5/5] vfio iommu: Implementation of ioctl to get dirty bitmap before unmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191205012835.GB31791@joy-OptiPlex-7040 \
    --to=yan.y.zhao@intel.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@Alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git