QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, cjia@nvidia.com, kvm@vger.kernel.org,
	eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org,
	cohuck@redhat.com, shuangtai.tst@alibaba-inc.com,
	dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com,
	pasic@linux.ibm.com, aik@ozlabs.ru, eauger@redhat.com,
	felipe@nutanix.com, jonathan.davies@nutanix.com,
	yan.y.zhao@intel.com, changpeng.liu@intel.com, Ken.Xue@amd.com
Subject: Re: [PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.
Date: Fri, 20 Mar 2020 12:59:10 -0600
Message-ID: <20200320125910.028d7af5@w520.home> (raw)
In-Reply-To: <cf0ee134-c1c7-f60c-afc2-8948268d8880@nvidia.com>

On Sat, 21 Mar 2020 00:12:04 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 3/20/2020 11:31 PM, Alex Williamson wrote:
> > On Fri, 20 Mar 2020 23:19:14 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> On 3/20/2020 4:27 AM, Alex Williamson wrote:  
> >>> On Fri, 20 Mar 2020 01:46:41 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>      
> 
> <snip>
> 
> >>>> +static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova,
> >>>> +				  size_t size, uint64_t pgsize,
> >>>> +				  u64 __user *bitmap)
> >>>> +{
> >>>> +	struct vfio_dma *dma;
> >>>> +	unsigned long pgshift = __ffs(pgsize);
> >>>> +	unsigned int npages, bitmap_size;
> >>>> +
> >>>> +	dma = vfio_find_dma(iommu, iova, 1);
> >>>> +
> >>>> +	if (!dma)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	if (dma->iova != iova || dma->size != size)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	npages = dma->size >> pgshift;
> >>>> +	bitmap_size = DIRTY_BITMAP_BYTES(npages);
> >>>> +
> >>>> +	/* mark all pages dirty if all pages are pinned and mapped. */
> >>>> +	if (dma->iommu_mapped)
> >>>> +		bitmap_set(dma->bitmap, 0, npages);
> >>>> +
> >>>> +	if (copy_to_user((void __user *)bitmap, dma->bitmap, bitmap_size))
> >>>> +		return -EFAULT;  
> >>>
> >>> We still need to reset the bitmap here, clearing and re-adding the
> >>> pages that are still pinned.
> >>>
> >>> https://lore.kernel.org/kvm/20200319070635.2ff5db56@x1.home/
> >>>      
> >>
> >> I thought you agreed on my reply to it
> >> https://lore.kernel.org/kvm/31621b70-02a9-2ea5-045f-f72b671fe703@nvidia.com/
> >>  
> >>   > Why re-populate when there will be no change since
> >>   > vfio_iova_dirty_bitmap() is called holding iommu->lock? If there is any
> >>   > pin request while vfio_iova_dirty_bitmap() is still working, it will
> >>   > wait till iommu->lock is released. Bitmap will be populated when page is
> >>   > pinned.  
> > 
> > As coded, dirty bits are only ever set in the bitmap, never cleared.
> > If a page is unpinned between iterations of the user recording the
> > dirty bitmap, it should be marked dirty in the iteration immediately
> > after the unpinning and not marked dirty in the following iteration.
> > That doesn't happen here.  We're reporting cumulative dirty pages since
> > logging was enabled, we need to be reporting dirty pages since the user
> > last retrieved the dirty bitmap.  The bitmap should be cleared and
> > currently pinned pages re-added after copying to the user.  Thanks,
> >   
> 
> Does that mean, we have to track every iteration? do we really need that 
> tracking?
> 
> Generally the flow is:
> - vendor driver pin x pages
> - Enter pre-copy-phase where vCPUs are running - user starts dirty pages 
> tracking, then user asks dirty bitmap, x pages reported dirty by 
> VFIO_IOMMU_DIRTY_PAGES ioctl with _GET flag
> - In pre-copy phase, vendor driver pins y more pages, now bitmap 
> consists of x+y bits set
> - In pre-copy phase, vendor driver unpins z pages, but bitmap is not 
> updated, so again bitmap consists of x+y bits set.
> - Enter in stop-and-copy phase, vCPUs are stopped, mdev devices are stopped
> - user asks dirty bitmap - Since here vCPU and mdev devices are stopped, 
> pages should not get dirty by guest driver or the physical device. 
> Hence, x+y dirty pages would be reported.
> 
> I don't think we need to track every iteration of bitmap reporting.

Yes, once a bitmap is read, it's reset.  In your example, after
unpinning z pages the user should still see a bitmap with x+y pages,
but once they've read that bitmap, the next bitmap should be x+y-z.
Userspace can make decisions about when to switch from pre-copy to
stop-and-copy based on convergence, ie. the slope of the line recording
dirty pages per iteration.  The implementation here never allows an
inflection point, dirty pages reported through vfio would always either
be flat or climbing.  There might also be a case that an iommu backed
device could start pinning pages during the course of a migration, how
would the bitmap ever revert from fully populated to only tracking the
pinned pages?  Thanks,

Alex



  reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-19 20:16 [PATCH v15 Kernel 0/7] KABIs to support migration for VFIO devices Kirti Wankhede
2020-03-19 20:16 ` [PATCH v15 Kernel 1/7] vfio: KABI for migration interface for device state Kirti Wankhede
2020-03-23 20:30   ` Auger Eric
2020-03-24 19:31     ` Kirti Wankhede
2020-03-26  9:33   ` Christoph Hellwig
2020-03-26 21:39     ` Kirti Wankhede
2020-03-19 20:16 ` [PATCH v15 Kernel 2/7] vfio iommu: Remove atomicity of ref_count of pinned pages Kirti Wankhede
2020-03-23 20:30   ` Auger Eric
2020-03-24 19:34     ` Kirti Wankhede
2020-03-19 20:16 ` [PATCH v15 Kernel 3/7] vfio iommu: Add ioctl definition for dirty pages tracking Kirti Wankhede
2020-03-23 21:11   ` Auger Eric
2020-03-24 19:49     ` Kirti Wankhede
2020-03-19 20:16 ` [PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl " Kirti Wankhede
2020-03-19 22:57   ` Alex Williamson
2020-03-20 17:49     ` Kirti Wankhede
2020-03-20 18:01       ` Alex Williamson
2020-03-20 18:42         ` Kirti Wankhede
2020-03-20 18:59           ` Alex Williamson [this message]
2020-03-23 17:54             ` Kirti Wankhede
2020-03-23 18:44               ` Alex Williamson
2020-03-23 18:51                 ` Dr. David Alan Gilbert
2020-03-24  3:01                   ` Yan Zhao
2020-03-24  9:45                     ` Kirti Wankhede
2020-03-24 14:36                     ` Alex Williamson
2020-03-24 20:23                       ` Dr. David Alan Gilbert
2020-03-25  5:31                         ` Tian, Kevin
2020-03-19 20:16 ` [PATCH v15 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap Kirti Wankhede
2020-03-19 20:16 ` [PATCH v15 Kernel 6/7] vfio iommu: Adds flag to indicate dirty pages tracking capability support Kirti Wankhede
2020-03-19 20:16 ` [PATCH v15 Kernel 7/7] vfio: Selective dirty page tracking if IOMMU backed device pins pages Kirti Wankhede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200320125910.028d7af5@w520.home \
    --to=alex.williamson@redhat.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@Alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git