KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Yan Zhao <yan.y.zhao@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>,
	"cjia@nvidia.com" <cjia@nvidia.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Yang, Ziye" <ziye.yang@intel.com>,
	"Liu, Changpeng" <changpeng.liu@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
	"eskultet@redhat.com" <eskultet@redhat.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"jonathan.davies@nutanix.com" <jonathan.davies@nutanix.com>,
	"eauger@redhat.com" <eauger@redhat.com>,
	"aik@ozlabs.ru" <aik@ozlabs.ru>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"felipe@nutanix.com" <felipe@nutanix.com>,
	"Zhengxiao.zx@Alibaba-inc.com" <Zhengxiao.zx@Alibaba-inc.com>,
	"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
	"Ken.Xue@amd.com" <Ken.Xue@amd.com>,
	"Wang, Zhi A" <zhi.a.wang@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH v14 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
Date: Sun, 22 Mar 2020 21:10:41 -0400
Message-ID: <20200323011041.GB5456@joy-OptiPlex-7040> (raw)
In-Reply-To: <20200320132821.2fe80c29@w520.home>

On Sat, Mar 21, 2020 at 03:28:21AM +0800, Alex Williamson wrote:
> On Sat, 21 Mar 2020 00:44:32 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
> > On 3/20/2020 9:17 PM, Alex Williamson wrote:
> > > On Fri, 20 Mar 2020 09:40:39 -0600
> > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > >   
> > >> On Fri, 20 Mar 2020 04:35:29 -0400
> > >> Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >>  
> > >>> On Thu, Mar 19, 2020 at 03:41:12AM +0800, Kirti Wankhede wrote:  
> > >>>> DMA mapped pages, including those pinned by mdev vendor drivers, might
> > >>>> get unpinned and unmapped while migration is active and device is still
> > >>>> running. For example, in pre-copy phase while guest driver could access
> > >>>> those pages, host device or vendor driver can dirty these mapped pages.
> > >>>> Such pages should be marked dirty so as to maintain memory consistency
> > >>>> for a user making use of dirty page tracking.
> > >>>>
> > >>>> To get bitmap during unmap, user should set flag
> > >>>> VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and
> > >>>> zeroed by user space application. Bitmap size and page size should be set
> > >>>> by user application.
> > >>>>
> > >>>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> > >>>> Reviewed-by: Neo Jia <cjia@nvidia.com>
> > >>>> ---
> > >>>>   drivers/vfio/vfio_iommu_type1.c | 55 ++++++++++++++++++++++++++++++++++++++---
> > >>>>   include/uapi/linux/vfio.h       | 11 +++++++++
> > >>>>   2 files changed, 62 insertions(+), 4 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > >>>> index d6417fb02174..aa1ac30f7854 100644
> > >>>> --- a/drivers/vfio/vfio_iommu_type1.c
> > >>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> > >>>> @@ -939,7 +939,8 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
> > >>>>   }
> > >>>>   
> > >>>>   static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> > >>>> -			     struct vfio_iommu_type1_dma_unmap *unmap)
> > >>>> +			     struct vfio_iommu_type1_dma_unmap *unmap,
> > >>>> +			     struct vfio_bitmap *bitmap)
> > >>>>   {
> > >>>>   	uint64_t mask;
> > >>>>   	struct vfio_dma *dma, *dma_last = NULL;
> > >>>> @@ -990,6 +991,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> > >>>>   	 * will be returned if these conditions are not met.  The v2 interface
> > >>>>   	 * will only return success and a size of zero if there were no
> > >>>>   	 * mappings within the range.
> > >>>> +	 *
> > >>>> +	 * When VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP flag is set, unmap request
> > >>>> +	 * must be for single mapping. Multiple mappings with this flag set is
> > >>>> +	 * not supported.
> > >>>>   	 */
> > >>>>   	if (iommu->v2) {
> > >>>>   		dma = vfio_find_dma(iommu, unmap->iova, 1);
> > >>>> @@ -997,6 +1002,13 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> > >>>>   			ret = -EINVAL;
> > >>>>   			goto unlock;
> > >>>>   		}
> > >>>> +
> > >>>> +		if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> > >>>> +		    (dma->iova != unmap->iova || dma->size != unmap->size)) {  
> > >>> dma is probably NULL here!  
> > >>
> > >> Yep, I didn't look closely enough there.  This is situated right
> > >> between the check to make sure we're not bisecting a mapping at the
> > >> start of the unmap and the check to make sure we're not bisecting a
> > >> mapping at the end of the unmap.  There's no guarantee that we have a
> > >> valid pointer here.  The test should be in the while() loop below this
> > >> code.  
> > > 
> > > Actually the test could remain here, we can exit here if we can't find
> > > a dma at the start of the unmap range with the GET_DIRTY_BITMAP flag,
> > > but we absolutely cannot deref dma without testing it.
> > >   
> > 
> > In the check above newly added check, if dma is NULL then its an error 
> > condition, because Unmap requests must fully cover previous mappings, right?
> 
> Yes, but we'll do a null pointer deref before we return error.
>  
> > >>> And this restriction on UNMAP would make some UNMAP operations of vIOMMU
> > >>> fail.
> > >>>
> > >>> e.g. below condition indeed happens in reality.
> > >>> an UNMAP ioctl comes for IOVA range from 0xff800000, of size 0x200000
> > >>> However, IOVAs in this range are mapped page by page.i.e., dma->size is 0x1000.
> > >>>
> > >>> Previous, this UNMAP ioctl could unmap successfully as a whole.  
> > >>
> > >> What triggers this in the guest?  Note that it's only when using the
> > >> GET_DIRTY_BITMAP flag that this is restricted.  Does the event you're
> > >> referring to potentially occur under normal circumstances in that mode?
> > >> Thanks,
> > >>  

it happens in vIOMMU Domain level invalidation of IOTLB
(domain-selective invalidation, see vtd_iotlb_domain_invalidate() in qemu).
common in VTD lazy mode, and NOT just happening once at boot time.
rather than invalidate page by page, it batches the page invalidation.
so, when this invalidation takes place, even higher level page tables
have been invalid and therefore it has to invalidate a bigger combined range.
That's why we see IOVAs are mapped in 4k pages, but are unmapped in 2M
pages.

I think those UNMAPs should also have GET_DIRTY_BIMTAP flag on, right?
> > 
> > Such unmap would callback vfio_iommu_map_notify() in QEMU. In 
> > vfio_iommu_map_notify(), unmap is called on same range <iova, 
> > iotlb->addr_mask + 1> which was used for map. Secondly unmap with bitmap 
> > will be called only when device state has _SAVING flag set.
> 
in this case, iotlb->addr_mask in unmap is 0x200000 -1.
different than 0x1000 -1 used for map.
> It might be helpful for Yan, and everyone else, to see the latest QEMU
> patch series.  Thanks,
>
yes, please. also curious of log_sync part for vIOMMU. given most IOVAs in
address space are unmapped and therefore no IOTLBs are able to be found.

Thanks
Yan

  reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-18 19:41 [PATCH v14 Kernel 0/7] KABIs to support migration for VFIO devices Kirti Wankhede
2020-03-18 19:41 ` [PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state Kirti Wankhede
2020-03-19  1:17   ` Yan Zhao
2020-03-19  3:49     ` Alex Williamson
2020-03-19  5:05       ` Yan Zhao
2020-03-19 13:09         ` Alex Williamson
2020-03-20  1:30           ` Yan Zhao
2020-03-20  2:34             ` Alex Williamson
2020-03-20  3:06               ` Yan Zhao
2020-03-20  4:09                 ` Alex Williamson
2020-03-20  4:20                   ` Yan Zhao
2020-03-23 14:45           ` Auger Eric
2020-03-23 11:45   ` Auger Eric
2020-03-24 19:14     ` Kirti Wankhede
2020-03-18 19:41 ` [PATCH v14 Kernel 2/7] vfio iommu: Remove atomicity of ref_count of pinned pages Kirti Wankhede
2020-03-23 11:59   ` Auger Eric
2020-03-18 19:41 ` [PATCH v14 Kernel 3/7] vfio iommu: Add ioctl definition for dirty pages tracking Kirti Wankhede
2020-03-19  3:44   ` Alex Williamson
2020-03-18 19:41 ` [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl " Kirti Wankhede
2020-03-19  3:06   ` Yan Zhao
2020-03-19  4:01     ` Alex Williamson
2020-03-19  4:15       ` Yan Zhao
2020-03-19  4:40         ` Alex Williamson
2020-03-19  6:15           ` Yan Zhao
2020-03-19 13:06             ` Alex Williamson
2020-03-19 16:57               ` Kirti Wankhede
2020-03-20  0:51                 ` Yan Zhao
2020-03-19  3:45   ` Alex Williamson
2020-03-19 14:52     ` Kirti Wankhede
2020-03-19 16:22       ` Alex Williamson
2020-03-19 20:25         ` Kirti Wankhede
2020-03-19 20:54           ` Alex Williamson
2020-03-19 18:57     ` Kirti Wankhede
2020-03-18 19:41 ` [PATCH v14 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap Kirti Wankhede
2020-03-19  3:45   ` Alex Williamson
2020-03-20  8:35   ` Yan Zhao
2020-03-20 15:40     ` Alex Williamson
2020-03-20 15:47       ` Alex Williamson
2020-03-20 19:14         ` Kirti Wankhede
2020-03-20 19:28           ` Alex Williamson
2020-03-23  1:10             ` Yan Zhao [this message]
2020-03-18 19:41 ` [PATCH v14 Kernel 6/7] vfio iommu: Adds flag to indicate dirty pages tracking capability support Kirti Wankhede
2020-03-18 19:41 ` [PATCH v14 Kernel 7/7] vfio: Selective dirty page tracking if IOMMU backed device pins pages Kirti Wankhede
2020-03-19  3:45   ` Alex Williamson
2020-03-19  6:24   ` Yan Zhao
2020-03-20 19:41     ` Alex Williamson
2020-03-23  2:43       ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200323011041.GB5456@joy-OptiPlex-7040 \
    --to=yan.y.zhao@intel.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@Alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git