All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
@ 2020-05-05 18:49 kbuild test robot
  0 siblings, 0 replies; 9+ messages in thread
From: kbuild test robot @ 2020-05-05 18:49 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 15398 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <1588607939-26441-6-git-send-email-kwankhede@nvidia.com>
References: <1588607939-26441-6-git-send-email-kwankhede@nvidia.com>
TO: Kirti Wankhede <kwankhede@nvidia.com>
TO: alex.williamson(a)redhat.com
TO: cjia(a)nvidia.com
CC: kevin.tian(a)intel.com
CC: ziye.yang(a)intel.com
CC: changpeng.liu(a)intel.com
CC: yi.l.liu(a)intel.com
CC: mlevitsk(a)redhat.com
CC: eskultet(a)redhat.com
CC: cohuck(a)redhat.com
CC: dgilbert(a)redhat.com
CC: jonathan.davies(a)nutanix.com

Hi Kirti,

I love your patch! Perhaps something to improve:

[auto build test WARNING on v5.7-rc4]
[also build test WARNING on next-20200505]
[cannot apply to vfio/next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Kirti-Wankhede/KABIs-to-support-migration-for-VFIO-devices/20200505-084649
base:    0e698dfa282211e414076f9dc7e83c1c288314fd
reproduce:
        # apt-get install sparse
        # sparse version: 
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
:::::: branch date: 18 hours ago
:::::: commit date: 18 hours ago

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   drivers/vfio/vfio_iommu_type1.c:1087:58: sparse: warning: incorrect type in argument 5 (different address spaces)
>> drivers/vfio/vfio_iommu_type1.c:1087:58: sparse:    expected unsigned long long [noderef] [usertype] <asn:1> *bitmap
>> drivers/vfio/vfio_iommu_type1.c:1087:58: sparse:    got unsigned long long [usertype] *

# https://github.com/0day-ci/linux/commit/5935872fe363ec9158528a84142d4f3c93da8383
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 5935872fe363ec9158528a84142d4f3c93da8383
vim +1087 drivers/vfio/vfio_iommu_type1.c

352e6a6499fe9c Kirti Wankhede  2020-05-04   984  
73fa0d10d077d9 Alex Williamson 2012-07-31   985  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
5935872fe363ec Kirti Wankhede  2020-05-04   986  			     struct vfio_iommu_type1_dma_unmap *unmap,
5935872fe363ec Kirti Wankhede  2020-05-04   987  			     struct vfio_bitmap *bitmap)
73fa0d10d077d9 Alex Williamson 2012-07-31   988  {
73fa0d10d077d9 Alex Williamson 2012-07-31   989  	uint64_t mask;
c086de818dd81c Kirti Wankhede  2016-11-17   990  	struct vfio_dma *dma, *dma_last = NULL;
1ef3e2bc04223f Alex Williamson 2014-02-26   991  	size_t unmapped = 0;
c086de818dd81c Kirti Wankhede  2016-11-17   992  	int ret = 0, retries = 0;
5935872fe363ec Kirti Wankhede  2020-05-04   993  	unsigned long *final_bitmap = NULL, *temp_bitmap = NULL;
73fa0d10d077d9 Alex Williamson 2012-07-31   994  
1ef3e2bc04223f Alex Williamson 2014-02-26   995  	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
73fa0d10d077d9 Alex Williamson 2012-07-31   996  
73fa0d10d077d9 Alex Williamson 2012-07-31   997  	if (unmap->iova & mask)
73fa0d10d077d9 Alex Williamson 2012-07-31   998  		return -EINVAL;
f5bfdbf252ad54 Alex Williamson 2013-06-25   999  	if (!unmap->size || unmap->size & mask)
73fa0d10d077d9 Alex Williamson 2012-07-31  1000  		return -EINVAL;
58fec830fc1920 Alex Williamson 2019-01-07  1001  	if (unmap->iova + unmap->size - 1 < unmap->iova ||
71a7d3d78e3ca5 Dan Carpenter   2017-10-20  1002  	    unmap->size > SIZE_MAX)
71a7d3d78e3ca5 Dan Carpenter   2017-10-20  1003  		return -EINVAL;
73fa0d10d077d9 Alex Williamson 2012-07-31  1004  
73fa0d10d077d9 Alex Williamson 2012-07-31  1005  	WARN_ON(mask & PAGE_MASK);
c086de818dd81c Kirti Wankhede  2016-11-17  1006  again:
73fa0d10d077d9 Alex Williamson 2012-07-31  1007  	mutex_lock(&iommu->lock);
73fa0d10d077d9 Alex Williamson 2012-07-31  1008  
1ef3e2bc04223f Alex Williamson 2014-02-26  1009  	/*
1ef3e2bc04223f Alex Williamson 2014-02-26  1010  	 * vfio-iommu-type1 (v1) - User mappings were coalesced together to
1ef3e2bc04223f Alex Williamson 2014-02-26  1011  	 * avoid tracking individual mappings.  This means that the granularity
1ef3e2bc04223f Alex Williamson 2014-02-26  1012  	 * of the original mapping was lost and the user was allowed to attempt
1ef3e2bc04223f Alex Williamson 2014-02-26  1013  	 * to unmap any range.  Depending on the contiguousness of physical
1ef3e2bc04223f Alex Williamson 2014-02-26  1014  	 * memory and page sizes supported by the IOMMU, arbitrary unmaps may
1ef3e2bc04223f Alex Williamson 2014-02-26  1015  	 * or may not have worked.  We only guaranteed unmap granularity
1ef3e2bc04223f Alex Williamson 2014-02-26  1016  	 * matching the original mapping; even though it was untracked here,
1ef3e2bc04223f Alex Williamson 2014-02-26  1017  	 * the original mappings are reflected in IOMMU mappings.  This
1ef3e2bc04223f Alex Williamson 2014-02-26  1018  	 * resulted in a couple unusual behaviors.  First, if a range is not
1ef3e2bc04223f Alex Williamson 2014-02-26  1019  	 * able to be unmapped, ex. a set of 4k pages that was mapped as a
1ef3e2bc04223f Alex Williamson 2014-02-26  1020  	 * 2M hugepage into the IOMMU, the unmap ioctl returns success but with
1ef3e2bc04223f Alex Williamson 2014-02-26  1021  	 * a zero sized unmap.  Also, if an unmap request overlaps the first
1ef3e2bc04223f Alex Williamson 2014-02-26  1022  	 * address of a hugepage, the IOMMU will unmap the entire hugepage.
1ef3e2bc04223f Alex Williamson 2014-02-26  1023  	 * This also returns success and the returned unmap size reflects the
1ef3e2bc04223f Alex Williamson 2014-02-26  1024  	 * actual size unmapped.
1ef3e2bc04223f Alex Williamson 2014-02-26  1025  	 *
1ef3e2bc04223f Alex Williamson 2014-02-26  1026  	 * We attempt to maintain compatibility with this "v1" interface, but
1ef3e2bc04223f Alex Williamson 2014-02-26  1027  	 * we take control out of the hands of the IOMMU.  Therefore, an unmap
1ef3e2bc04223f Alex Williamson 2014-02-26  1028  	 * request offset from the beginning of the original mapping will
1ef3e2bc04223f Alex Williamson 2014-02-26  1029  	 * return success with zero sized unmap.  And an unmap request covering
1ef3e2bc04223f Alex Williamson 2014-02-26  1030  	 * the first iova of mapping will unmap the entire range.
1ef3e2bc04223f Alex Williamson 2014-02-26  1031  	 *
1ef3e2bc04223f Alex Williamson 2014-02-26  1032  	 * The v2 version of this interface intends to be more deterministic.
1ef3e2bc04223f Alex Williamson 2014-02-26  1033  	 * Unmap requests must fully cover previous mappings.  Multiple
1ef3e2bc04223f Alex Williamson 2014-02-26  1034  	 * mappings may still be unmaped by specifying large ranges, but there
1ef3e2bc04223f Alex Williamson 2014-02-26  1035  	 * must not be any previous mappings bisected by the range.  An error
1ef3e2bc04223f Alex Williamson 2014-02-26  1036  	 * will be returned if these conditions are not met.  The v2 interface
1ef3e2bc04223f Alex Williamson 2014-02-26  1037  	 * will only return success and a size of zero if there were no
1ef3e2bc04223f Alex Williamson 2014-02-26  1038  	 * mappings within the range.
1ef3e2bc04223f Alex Williamson 2014-02-26  1039  	 */
1ef3e2bc04223f Alex Williamson 2014-02-26  1040  	if (iommu->v2) {
7c03f428464333 Kirti Wankhede  2016-12-06  1041  		dma = vfio_find_dma(iommu, unmap->iova, 1);
1ef3e2bc04223f Alex Williamson 2014-02-26  1042  		if (dma && dma->iova != unmap->iova) {
1ef3e2bc04223f Alex Williamson 2014-02-26  1043  			ret = -EINVAL;
1ef3e2bc04223f Alex Williamson 2014-02-26  1044  			goto unlock;
1ef3e2bc04223f Alex Williamson 2014-02-26  1045  		}
5935872fe363ec Kirti Wankhede  2020-05-04  1046  
1ef3e2bc04223f Alex Williamson 2014-02-26  1047  		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
1ef3e2bc04223f Alex Williamson 2014-02-26  1048  		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
1ef3e2bc04223f Alex Williamson 2014-02-26  1049  			ret = -EINVAL;
1ef3e2bc04223f Alex Williamson 2014-02-26  1050  			goto unlock;
1ef3e2bc04223f Alex Williamson 2014-02-26  1051  		}
1ef3e2bc04223f Alex Williamson 2014-02-26  1052  	}
1ef3e2bc04223f Alex Williamson 2014-02-26  1053  
5935872fe363ec Kirti Wankhede  2020-05-04  1054  	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
5935872fe363ec Kirti Wankhede  2020-05-04  1055  	     iommu->dirty_page_tracking) {
5935872fe363ec Kirti Wankhede  2020-05-04  1056  		final_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
5935872fe363ec Kirti Wankhede  2020-05-04  1057  		if (!final_bitmap) {
5935872fe363ec Kirti Wankhede  2020-05-04  1058  			ret = -ENOMEM;
5935872fe363ec Kirti Wankhede  2020-05-04  1059  			goto unlock;
5935872fe363ec Kirti Wankhede  2020-05-04  1060  		}
5935872fe363ec Kirti Wankhede  2020-05-04  1061  
5935872fe363ec Kirti Wankhede  2020-05-04  1062  		temp_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
5935872fe363ec Kirti Wankhede  2020-05-04  1063  		if (!temp_bitmap) {
5935872fe363ec Kirti Wankhede  2020-05-04  1064  			ret = -ENOMEM;
5935872fe363ec Kirti Wankhede  2020-05-04  1065  			kfree(final_bitmap);
5935872fe363ec Kirti Wankhede  2020-05-04  1066  			goto unlock;
5935872fe363ec Kirti Wankhede  2020-05-04  1067  		}
5935872fe363ec Kirti Wankhede  2020-05-04  1068  	}
5935872fe363ec Kirti Wankhede  2020-05-04  1069  
166fd7d94afdac Alex Williamson 2013-06-21  1070  	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
1ef3e2bc04223f Alex Williamson 2014-02-26  1071  		if (!iommu->v2 && unmap->iova > dma->iova)
166fd7d94afdac Alex Williamson 2013-06-21  1072  			break;
8f0d5bb95f763c Kirti Wankhede  2016-11-17  1073  		/*
8f0d5bb95f763c Kirti Wankhede  2016-11-17  1074  		 * Task with same address space who mapped this iova range is
8f0d5bb95f763c Kirti Wankhede  2016-11-17  1075  		 * allowed to unmap the iova range.
8f0d5bb95f763c Kirti Wankhede  2016-11-17  1076  		 */
8f0d5bb95f763c Kirti Wankhede  2016-11-17  1077  		if (dma->task->mm != current->mm)
8f0d5bb95f763c Kirti Wankhede  2016-11-17  1078  			break;
c086de818dd81c Kirti Wankhede  2016-11-17  1079  
5935872fe363ec Kirti Wankhede  2020-05-04  1080  		if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
5935872fe363ec Kirti Wankhede  2020-05-04  1081  		     iommu->dirty_page_tracking) {
5935872fe363ec Kirti Wankhede  2020-05-04  1082  			unsigned long pgshift = __ffs(bitmap->pgsize);
5935872fe363ec Kirti Wankhede  2020-05-04  1083  			unsigned int npages = dma->size >> pgshift;
5935872fe363ec Kirti Wankhede  2020-05-04  1084  			unsigned int shift;
5935872fe363ec Kirti Wankhede  2020-05-04  1085  
5935872fe363ec Kirti Wankhede  2020-05-04  1086  			vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
5935872fe363ec Kirti Wankhede  2020-05-04 @1087  					bitmap->pgsize, (u64 *)temp_bitmap);
5935872fe363ec Kirti Wankhede  2020-05-04  1088  
5935872fe363ec Kirti Wankhede  2020-05-04  1089  			shift = (dma->iova - unmap->iova) >> pgshift;
5935872fe363ec Kirti Wankhede  2020-05-04  1090  			if (shift)
5935872fe363ec Kirti Wankhede  2020-05-04  1091  				bitmap_shift_left(temp_bitmap, temp_bitmap,
5935872fe363ec Kirti Wankhede  2020-05-04  1092  						  shift, npages);
5935872fe363ec Kirti Wankhede  2020-05-04  1093  			bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
5935872fe363ec Kirti Wankhede  2020-05-04  1094  				  shift + npages);
5935872fe363ec Kirti Wankhede  2020-05-04  1095  			memset(temp_bitmap, 0, bitmap->size);
5935872fe363ec Kirti Wankhede  2020-05-04  1096  		}
5935872fe363ec Kirti Wankhede  2020-05-04  1097  
c086de818dd81c Kirti Wankhede  2016-11-17  1098  		if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
c086de818dd81c Kirti Wankhede  2016-11-17  1099  			struct vfio_iommu_type1_dma_unmap nb_unmap;
c086de818dd81c Kirti Wankhede  2016-11-17  1100  
c086de818dd81c Kirti Wankhede  2016-11-17  1101  			if (dma_last == dma) {
c086de818dd81c Kirti Wankhede  2016-11-17  1102  				BUG_ON(++retries > 10);
c086de818dd81c Kirti Wankhede  2016-11-17  1103  			} else {
c086de818dd81c Kirti Wankhede  2016-11-17  1104  				dma_last = dma;
c086de818dd81c Kirti Wankhede  2016-11-17  1105  				retries = 0;
c086de818dd81c Kirti Wankhede  2016-11-17  1106  			}
c086de818dd81c Kirti Wankhede  2016-11-17  1107  
c086de818dd81c Kirti Wankhede  2016-11-17  1108  			nb_unmap.iova = dma->iova;
c086de818dd81c Kirti Wankhede  2016-11-17  1109  			nb_unmap.size = dma->size;
c086de818dd81c Kirti Wankhede  2016-11-17  1110  
c086de818dd81c Kirti Wankhede  2016-11-17  1111  			/*
c086de818dd81c Kirti Wankhede  2016-11-17  1112  			 * Notify anyone (mdev vendor drivers) to invalidate and
c086de818dd81c Kirti Wankhede  2016-11-17  1113  			 * unmap iovas within the range we're about to unmap.
c086de818dd81c Kirti Wankhede  2016-11-17  1114  			 * Vendor drivers MUST unpin pages in response to an
c086de818dd81c Kirti Wankhede  2016-11-17  1115  			 * invalidation.
c086de818dd81c Kirti Wankhede  2016-11-17  1116  			 */
c086de818dd81c Kirti Wankhede  2016-11-17  1117  			mutex_unlock(&iommu->lock);
c086de818dd81c Kirti Wankhede  2016-11-17  1118  			blocking_notifier_call_chain(&iommu->notifier,
c086de818dd81c Kirti Wankhede  2016-11-17  1119  						    VFIO_IOMMU_NOTIFY_DMA_UNMAP,
c086de818dd81c Kirti Wankhede  2016-11-17  1120  						    &nb_unmap);
c086de818dd81c Kirti Wankhede  2016-11-17  1121  			goto again;
c086de818dd81c Kirti Wankhede  2016-11-17  1122  		}
1ef3e2bc04223f Alex Williamson 2014-02-26  1123  		unmapped += dma->size;
1ef3e2bc04223f Alex Williamson 2014-02-26  1124  		vfio_remove_dma(iommu, dma);
166fd7d94afdac Alex Williamson 2013-06-21  1125  	}
cd9b22685e4ccd Alex Williamson 2013-06-21  1126  
1ef3e2bc04223f Alex Williamson 2014-02-26  1127  unlock:
5935872fe363ec Kirti Wankhede  2020-05-04  1128  	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
5935872fe363ec Kirti Wankhede  2020-05-04  1129  	     iommu->dirty_page_tracking && final_bitmap) {
5935872fe363ec Kirti Wankhede  2020-05-04  1130  		if (copy_to_user((void __user *)bitmap->data, final_bitmap,
5935872fe363ec Kirti Wankhede  2020-05-04  1131  				 bitmap->size))
5935872fe363ec Kirti Wankhede  2020-05-04  1132  			ret = -EFAULT;
5935872fe363ec Kirti Wankhede  2020-05-04  1133  
5935872fe363ec Kirti Wankhede  2020-05-04  1134  		kfree(final_bitmap);
5935872fe363ec Kirti Wankhede  2020-05-04  1135  		kfree(temp_bitmap);
5935872fe363ec Kirti Wankhede  2020-05-04  1136  	}
5935872fe363ec Kirti Wankhede  2020-05-04  1137  
73fa0d10d077d9 Alex Williamson 2012-07-31  1138  	mutex_unlock(&iommu->lock);
166fd7d94afdac Alex Williamson 2013-06-21  1139  
1ef3e2bc04223f Alex Williamson 2014-02-26  1140  	/* Report how much was unmapped */
166fd7d94afdac Alex Williamson 2013-06-21  1141  	unmap->size = unmapped;
166fd7d94afdac Alex Williamson 2013-06-21  1142  
166fd7d94afdac Alex Williamson 2013-06-21  1143  	return ret;
166fd7d94afdac Alex Williamson 2013-06-21  1144  }
166fd7d94afdac Alex Williamson 2013-06-21  1145  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
  2020-05-12 20:30       ` Kirti Wankhede
@ 2020-05-12 21:21         ` Alex Williamson
  -1 siblings, 0 replies; 9+ messages in thread
From: Alex Williamson @ 2020-05-12 21:21 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: cjia, kevin.tian, ziye.yang, changpeng.liu, yi.l.liu, mlevitsk,
	eskultet, cohuck, dgilbert, jonathan.davies, eauger, aik, pasic,
	felipe, Zhengxiao.zx, shuangtai.tst, Ken.Xue, zhi.a.wang,
	yan.y.zhao, qemu-devel, kvm

On Wed, 13 May 2020 02:00:54 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 5/7/2020 3:55 AM, Alex Williamson wrote:
> > On Mon, 4 May 2020 21:28:57 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> DMA mapped pages, including those pinned by mdev vendor drivers, might
> >> get unpinned and unmapped while migration is active and device is still
> >> running. For example, in pre-copy phase while guest driver could access
> >> those pages, host device or vendor driver can dirty these mapped pages.
> >> Such pages should be marked dirty so as to maintain memory consistency
> >> for a user making use of dirty page tracking.
> >>
> >> To get bitmap during unmap, user should allocate memory for bitmap, set
> >> size of allocated memory, set page size to be considered for bitmap and
> >> set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.
> >>
> >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> >> Reviewed-by: Neo Jia <cjia@nvidia.com>
> >> ---
> >>   drivers/vfio/vfio_iommu_type1.c | 84 +++++++++++++++++++++++++++++++++++++++--
> >>   include/uapi/linux/vfio.h       | 10 +++++
> >>   2 files changed, 90 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index 01dcb417836f..8b27faf1ec38 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -983,12 +983,14 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
> >>   }
> >>   
> >>   static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> >> -			     struct vfio_iommu_type1_dma_unmap *unmap)
> >> +			     struct vfio_iommu_type1_dma_unmap *unmap,
> >> +			     struct vfio_bitmap *bitmap)
> >>   {
> >>   	uint64_t mask;
> >>   	struct vfio_dma *dma, *dma_last = NULL;
> >>   	size_t unmapped = 0;
> >>   	int ret = 0, retries = 0;
> >> +	unsigned long *final_bitmap = NULL, *temp_bitmap = NULL;
> >>   
> >>   	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> >>   
> >> @@ -1041,6 +1043,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> >>   			ret = -EINVAL;
> >>   			goto unlock;
> >>   		}
> >> +
> >>   		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
> >>   		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
> >>   			ret = -EINVAL;
> >> @@ -1048,6 +1051,22 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> >>   		}
> >>   	}
> >>   
> >> +	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> >> +	     iommu->dirty_page_tracking) {  
> > 
> > Why do we even accept VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP when not
> > dirty page tracking rather than returning -EINVAL?  It would simplify
> > things here to reject it at the ioctl and silently ignoring a flag is
> > rarely if ever the right approach.
> >   
> >> +		final_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
> >> +		if (!final_bitmap) {
> >> +			ret = -ENOMEM;
> >> +			goto unlock;
> >> +		}
> >> +
> >> +		temp_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
> >> +		if (!temp_bitmap) {
> >> +			ret = -ENOMEM;
> >> +			kfree(final_bitmap);
> >> +			goto unlock;
> >> +		}  
> > 
> > YIKES!  So the user can instantly trigger the kernel to internally
> > allocate 2 x 256MB, regardless of how much they can actually map.
> >   
> 
> That is worst case senario. I don't think ideally that will ever hit. 
> More comment below regarding this.

If a user has the ability to lock 8TB of memory, then yeah, triggering
the kernel to allocate 512MB might not be a big deal.  But in this case
a malicious user can trigger the kernel to allocate 512MB of memory
regardless of their locked memory limits.  I think that's unacceptable.
 
> >> +	}
> >> +
> >>   	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
> >>   		if (!iommu->v2 && unmap->iova > dma->iova)
> >>   			break;
> >> @@ -1058,6 +1077,24 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> >>   		if (dma->task->mm != current->mm)
> >>   			break;
> >>   
> >> +		if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> >> +		     iommu->dirty_page_tracking) {
> >> +			unsigned long pgshift = __ffs(bitmap->pgsize);
> >> +			unsigned int npages = dma->size >> pgshift;
> >> +			unsigned int shift;
> >> +
> >> +			vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
> >> +					bitmap->pgsize, (u64 *)temp_bitmap);  
> > 
> > vfio_iova_dirty_bitmap() takes a __user bitmap, we're doing
> > copy_to_user() on a kernel allocated buffer???
> >   
> 
> Actually, there is no need to call vfio_iova_dirty_bitmap(), dma pointer 
> is known here and since its getting unmapped, there is no need to 
> repopulate bitmap. Removing vfio_iova_dirty_bitmap() and changing it as 
> below:
> 
> if (unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
>      unsigned long pgshift = __ffs(bitmap->pgsize);
>      unsigned int npages = dma->size >> pgshift;
>      unsigned int bitmap_size = DIRTY_BITMAP_BYTES(npages);
>      unsigned int shift = (dma->iova - unmap->iova) >>
>                                              pgshift;
>      /*
>       * mark all pages dirty if all pages are pinned and
>       * mapped.
>       */
>      if (dma->iommu_mapped)
>          bitmap_set(temp_bitmap, 0, npages);
>      else
>          memcpy(temp_bitmap, dma->bitmap, bitmap_size);
> 
>      if (shift)
>          bitmap_shift_left(temp_bitmap, temp_bitmap,
>                            shift, npages);
>      bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
>                shift + npages);
>      memset(temp_bitmap, 0, bitmap->size);
> }

Solves that problem, but I think also illustrates that if we could
shift dma->bitmap in place we could avoid a memcpy and a memset.

> >> +
> >> +			shift = (dma->iova - unmap->iova) >> pgshift;
> >> +			if (shift)
> >> +				bitmap_shift_left(temp_bitmap, temp_bitmap,
> >> +						  shift, npages);
> >> +			bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
> >> +				  shift + npages);
> >> +			memset(temp_bitmap, 0, bitmap->size);
> >> +		}  
> > 
> > It seems like if the per vfio_dma dirty bitmap was oversized by a long
> > that we could shift it in place, then we'd only need one working bitmap
> > buffer and we could size that to fit the vfio_dma (or the largest
> > vfio_dma if we don't want to free and re-alloc for each vfio_dma).
> > We'd need to do more copy_to/from_user()s, but we'd also avoid copying
> > between sparse mappings (user zero'd bitmap required) and we'd have a
> > far more reasonable memory usage.  Thanks,
> >  
> 
> I thought about it, but couldn't optimize to use one bitmap buffer.
> This case will only hit during migration with vIOMMU enabled.
> Can we keep these 2 bitmap buffers for now and optimize it later?

I don't see how we can limit it to that use case, or why that use case
makes it any less important to avoid exploitable inefficiencies like
this.  We're also potentially changing the uapi from not requiring a
user zero'd buffer to requiring a user zero'd buffer, which means we'd
need to support a backwards compatible uapi, exposing a really
inefficient means for the kernel to zero user memory.

Can you identify what doesn't work about the above proposal?  TBH, once
we shift the dma->bitmap in place, we could make it so that we can
directly copy_to_user and we'd only need to fixup any unaligned
overlaps between mappings, which we could do with just a few bytes
rather than some large final_bitmap buffer.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
@ 2020-05-12 21:21         ` Alex Williamson
  0 siblings, 0 replies; 9+ messages in thread
From: Alex Williamson @ 2020-05-12 21:21 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: Zhengxiao.zx, kevin.tian, yi.l.liu, cjia, kvm, eskultet,
	ziye.yang, qemu-devel, cohuck, shuangtai.tst, dgilbert,
	zhi.a.wang, mlevitsk, pasic, aik, eauger, felipe,
	jonathan.davies, yan.y.zhao, changpeng.liu, Ken.Xue

On Wed, 13 May 2020 02:00:54 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 5/7/2020 3:55 AM, Alex Williamson wrote:
> > On Mon, 4 May 2020 21:28:57 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> DMA mapped pages, including those pinned by mdev vendor drivers, might
> >> get unpinned and unmapped while migration is active and device is still
> >> running. For example, in pre-copy phase while guest driver could access
> >> those pages, host device or vendor driver can dirty these mapped pages.
> >> Such pages should be marked dirty so as to maintain memory consistency
> >> for a user making use of dirty page tracking.
> >>
> >> To get bitmap during unmap, user should allocate memory for bitmap, set
> >> size of allocated memory, set page size to be considered for bitmap and
> >> set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.
> >>
> >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> >> Reviewed-by: Neo Jia <cjia@nvidia.com>
> >> ---
> >>   drivers/vfio/vfio_iommu_type1.c | 84 +++++++++++++++++++++++++++++++++++++++--
> >>   include/uapi/linux/vfio.h       | 10 +++++
> >>   2 files changed, 90 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index 01dcb417836f..8b27faf1ec38 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -983,12 +983,14 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
> >>   }
> >>   
> >>   static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> >> -			     struct vfio_iommu_type1_dma_unmap *unmap)
> >> +			     struct vfio_iommu_type1_dma_unmap *unmap,
> >> +			     struct vfio_bitmap *bitmap)
> >>   {
> >>   	uint64_t mask;
> >>   	struct vfio_dma *dma, *dma_last = NULL;
> >>   	size_t unmapped = 0;
> >>   	int ret = 0, retries = 0;
> >> +	unsigned long *final_bitmap = NULL, *temp_bitmap = NULL;
> >>   
> >>   	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> >>   
> >> @@ -1041,6 +1043,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> >>   			ret = -EINVAL;
> >>   			goto unlock;
> >>   		}
> >> +
> >>   		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
> >>   		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
> >>   			ret = -EINVAL;
> >> @@ -1048,6 +1051,22 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> >>   		}
> >>   	}
> >>   
> >> +	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> >> +	     iommu->dirty_page_tracking) {  
> > 
> > Why do we even accept VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP when not
> > dirty page tracking rather than returning -EINVAL?  It would simplify
> > things here to reject it at the ioctl and silently ignoring a flag is
> > rarely if ever the right approach.
> >   
> >> +		final_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
> >> +		if (!final_bitmap) {
> >> +			ret = -ENOMEM;
> >> +			goto unlock;
> >> +		}
> >> +
> >> +		temp_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
> >> +		if (!temp_bitmap) {
> >> +			ret = -ENOMEM;
> >> +			kfree(final_bitmap);
> >> +			goto unlock;
> >> +		}  
> > 
> > YIKES!  So the user can instantly trigger the kernel to internally
> > allocate 2 x 256MB, regardless of how much they can actually map.
> >   
> 
> That is worst case senario. I don't think ideally that will ever hit. 
> More comment below regarding this.

If a user has the ability to lock 8TB of memory, then yeah, triggering
the kernel to allocate 512MB might not be a big deal.  But in this case
a malicious user can trigger the kernel to allocate 512MB of memory
regardless of their locked memory limits.  I think that's unacceptable.
 
> >> +	}
> >> +
> >>   	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
> >>   		if (!iommu->v2 && unmap->iova > dma->iova)
> >>   			break;
> >> @@ -1058,6 +1077,24 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> >>   		if (dma->task->mm != current->mm)
> >>   			break;
> >>   
> >> +		if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> >> +		     iommu->dirty_page_tracking) {
> >> +			unsigned long pgshift = __ffs(bitmap->pgsize);
> >> +			unsigned int npages = dma->size >> pgshift;
> >> +			unsigned int shift;
> >> +
> >> +			vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
> >> +					bitmap->pgsize, (u64 *)temp_bitmap);  
> > 
> > vfio_iova_dirty_bitmap() takes a __user bitmap, we're doing
> > copy_to_user() on a kernel allocated buffer???
> >   
> 
> Actually, there is no need to call vfio_iova_dirty_bitmap(), dma pointer 
> is known here and since its getting unmapped, there is no need to 
> repopulate bitmap. Removing vfio_iova_dirty_bitmap() and changing it as 
> below:
> 
> if (unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
>      unsigned long pgshift = __ffs(bitmap->pgsize);
>      unsigned int npages = dma->size >> pgshift;
>      unsigned int bitmap_size = DIRTY_BITMAP_BYTES(npages);
>      unsigned int shift = (dma->iova - unmap->iova) >>
>                                              pgshift;
>      /*
>       * mark all pages dirty if all pages are pinned and
>       * mapped.
>       */
>      if (dma->iommu_mapped)
>          bitmap_set(temp_bitmap, 0, npages);
>      else
>          memcpy(temp_bitmap, dma->bitmap, bitmap_size);
> 
>      if (shift)
>          bitmap_shift_left(temp_bitmap, temp_bitmap,
>                            shift, npages);
>      bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
>                shift + npages);
>      memset(temp_bitmap, 0, bitmap->size);
> }

Solves that problem, but I think also illustrates that if we could
shift dma->bitmap in place we could avoid a memcpy and a memset.

> >> +
> >> +			shift = (dma->iova - unmap->iova) >> pgshift;
> >> +			if (shift)
> >> +				bitmap_shift_left(temp_bitmap, temp_bitmap,
> >> +						  shift, npages);
> >> +			bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
> >> +				  shift + npages);
> >> +			memset(temp_bitmap, 0, bitmap->size);
> >> +		}  
> > 
> > It seems like if the per vfio_dma dirty bitmap was oversized by a long
> > that we could shift it in place, then we'd only need one working bitmap
> > buffer and we could size that to fit the vfio_dma (or the largest
> > vfio_dma if we don't want to free and re-alloc for each vfio_dma).
> > We'd need to do more copy_to/from_user()s, but we'd also avoid copying
> > between sparse mappings (user zero'd bitmap required) and we'd have a
> > far more reasonable memory usage.  Thanks,
> >  
> 
> I thought about it, but couldn't optimize to use one bitmap buffer.
> This case will only hit during migration with vIOMMU enabled.
> Can we keep these 2 bitmap buffers for now and optimize it later?

I don't see how we can limit it to that use case, or why that use case
makes it any less important to avoid exploitable inefficiencies like
this.  We're also potentially changing the uapi from not requiring a
user zero'd buffer to requiring a user zero'd buffer, which means we'd
need to support a backwards compatible uapi, exposing a really
inefficient means for the kernel to zero user memory.

Can you identify what doesn't work about the above proposal?  TBH, once
we shift the dma->bitmap in place, we could make it so that we can
directly copy_to_user and we'd only need to fixup any unaligned
overlaps between mappings, which we could do with just a few bytes
rather than some large final_bitmap buffer.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
  2020-05-06 22:25     ` Alex Williamson
@ 2020-05-12 20:30       ` Kirti Wankhede
  -1 siblings, 0 replies; 9+ messages in thread
From: Kirti Wankhede @ 2020-05-12 20:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: cjia, kevin.tian, ziye.yang, changpeng.liu, yi.l.liu, mlevitsk,
	eskultet, cohuck, dgilbert, jonathan.davies, eauger, aik, pasic,
	felipe, Zhengxiao.zx, shuangtai.tst, Ken.Xue, zhi.a.wang,
	yan.y.zhao, qemu-devel, kvm



On 5/7/2020 3:55 AM, Alex Williamson wrote:
> On Mon, 4 May 2020 21:28:57 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> DMA mapped pages, including those pinned by mdev vendor drivers, might
>> get unpinned and unmapped while migration is active and device is still
>> running. For example, in pre-copy phase while guest driver could access
>> those pages, host device or vendor driver can dirty these mapped pages.
>> Such pages should be marked dirty so as to maintain memory consistency
>> for a user making use of dirty page tracking.
>>
>> To get bitmap during unmap, user should allocate memory for bitmap, set
>> size of allocated memory, set page size to be considered for bitmap and
>> set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.
>>
>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
>> Reviewed-by: Neo Jia <cjia@nvidia.com>
>> ---
>>   drivers/vfio/vfio_iommu_type1.c | 84 +++++++++++++++++++++++++++++++++++++++--
>>   include/uapi/linux/vfio.h       | 10 +++++
>>   2 files changed, 90 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 01dcb417836f..8b27faf1ec38 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -983,12 +983,14 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
>>   }
>>   
>>   static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>> -			     struct vfio_iommu_type1_dma_unmap *unmap)
>> +			     struct vfio_iommu_type1_dma_unmap *unmap,
>> +			     struct vfio_bitmap *bitmap)
>>   {
>>   	uint64_t mask;
>>   	struct vfio_dma *dma, *dma_last = NULL;
>>   	size_t unmapped = 0;
>>   	int ret = 0, retries = 0;
>> +	unsigned long *final_bitmap = NULL, *temp_bitmap = NULL;
>>   
>>   	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>   
>> @@ -1041,6 +1043,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>   			ret = -EINVAL;
>>   			goto unlock;
>>   		}
>> +
>>   		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>>   		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>>   			ret = -EINVAL;
>> @@ -1048,6 +1051,22 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>   		}
>>   	}
>>   
>> +	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
>> +	     iommu->dirty_page_tracking) {
> 
> Why do we even accept VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP when not
> dirty page tracking rather than returning -EINVAL?  It would simplify
> things here to reject it at the ioctl and silently ignoring a flag is
> rarely if ever the right approach.
> 
>> +		final_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
>> +		if (!final_bitmap) {
>> +			ret = -ENOMEM;
>> +			goto unlock;
>> +		}
>> +
>> +		temp_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
>> +		if (!temp_bitmap) {
>> +			ret = -ENOMEM;
>> +			kfree(final_bitmap);
>> +			goto unlock;
>> +		}
> 
> YIKES!  So the user can instantly trigger the kernel to internally
> allocate 2 x 256MB, regardless of how much they can actually map.
> 

That is worst case senario. I don't think ideally that will ever hit. 
More comment below regarding this.

>> +	}
>> +
>>   	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
>>   		if (!iommu->v2 && unmap->iova > dma->iova)
>>   			break;
>> @@ -1058,6 +1077,24 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>   		if (dma->task->mm != current->mm)
>>   			break;
>>   
>> +		if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
>> +		     iommu->dirty_page_tracking) {
>> +			unsigned long pgshift = __ffs(bitmap->pgsize);
>> +			unsigned int npages = dma->size >> pgshift;
>> +			unsigned int shift;
>> +
>> +			vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
>> +					bitmap->pgsize, (u64 *)temp_bitmap);
> 
> vfio_iova_dirty_bitmap() takes a __user bitmap, we're doing
> copy_to_user() on a kernel allocated buffer???
> 

Actually, there is no need to call vfio_iova_dirty_bitmap(), dma pointer 
is known here and since its getting unmapped, there is no need to 
repopulate bitmap. Removing vfio_iova_dirty_bitmap() and changing it as 
below:

if (unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
     unsigned long pgshift = __ffs(bitmap->pgsize);
     unsigned int npages = dma->size >> pgshift;
     unsigned int bitmap_size = DIRTY_BITMAP_BYTES(npages);
     unsigned int shift = (dma->iova - unmap->iova) >>
                                             pgshift;
     /*
      * mark all pages dirty if all pages are pinned and
      * mapped.
      */
     if (dma->iommu_mapped)
         bitmap_set(temp_bitmap, 0, npages);
     else
         memcpy(temp_bitmap, dma->bitmap, bitmap_size);

     if (shift)
         bitmap_shift_left(temp_bitmap, temp_bitmap,
                           shift, npages);
     bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
               shift + npages);
     memset(temp_bitmap, 0, bitmap->size);
}

>> +
>> +			shift = (dma->iova - unmap->iova) >> pgshift;
>> +			if (shift)
>> +				bitmap_shift_left(temp_bitmap, temp_bitmap,
>> +						  shift, npages);
>> +			bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
>> +				  shift + npages);
>> +			memset(temp_bitmap, 0, bitmap->size);
>> +		}
> 
> It seems like if the per vfio_dma dirty bitmap was oversized by a long
> that we could shift it in place, then we'd only need one working bitmap
> buffer and we could size that to fit the vfio_dma (or the largest
> vfio_dma if we don't want to free and re-alloc for each vfio_dma).
> We'd need to do more copy_to/from_user()s, but we'd also avoid copying
> between sparse mappings (user zero'd bitmap required) and we'd have a
> far more reasonable memory usage.  Thanks,
>

I thought about it, but couldn't optimize to use one bitmap buffer.
This case will only hit during migration with vIOMMU enabled.
Can we keep these 2 bitmap buffers for now and optimize it later?

Thanks,
Kirti

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
@ 2020-05-12 20:30       ` Kirti Wankhede
  0 siblings, 0 replies; 9+ messages in thread
From: Kirti Wankhede @ 2020-05-12 20:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Zhengxiao.zx, kevin.tian, yi.l.liu, cjia, kvm, eskultet,
	ziye.yang, qemu-devel, cohuck, shuangtai.tst, dgilbert,
	zhi.a.wang, mlevitsk, pasic, aik, eauger, felipe,
	jonathan.davies, yan.y.zhao, changpeng.liu, Ken.Xue



On 5/7/2020 3:55 AM, Alex Williamson wrote:
> On Mon, 4 May 2020 21:28:57 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> DMA mapped pages, including those pinned by mdev vendor drivers, might
>> get unpinned and unmapped while migration is active and device is still
>> running. For example, in pre-copy phase while guest driver could access
>> those pages, host device or vendor driver can dirty these mapped pages.
>> Such pages should be marked dirty so as to maintain memory consistency
>> for a user making use of dirty page tracking.
>>
>> To get bitmap during unmap, user should allocate memory for bitmap, set
>> size of allocated memory, set page size to be considered for bitmap and
>> set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.
>>
>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
>> Reviewed-by: Neo Jia <cjia@nvidia.com>
>> ---
>>   drivers/vfio/vfio_iommu_type1.c | 84 +++++++++++++++++++++++++++++++++++++++--
>>   include/uapi/linux/vfio.h       | 10 +++++
>>   2 files changed, 90 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 01dcb417836f..8b27faf1ec38 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -983,12 +983,14 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
>>   }
>>   
>>   static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>> -			     struct vfio_iommu_type1_dma_unmap *unmap)
>> +			     struct vfio_iommu_type1_dma_unmap *unmap,
>> +			     struct vfio_bitmap *bitmap)
>>   {
>>   	uint64_t mask;
>>   	struct vfio_dma *dma, *dma_last = NULL;
>>   	size_t unmapped = 0;
>>   	int ret = 0, retries = 0;
>> +	unsigned long *final_bitmap = NULL, *temp_bitmap = NULL;
>>   
>>   	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>   
>> @@ -1041,6 +1043,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>   			ret = -EINVAL;
>>   			goto unlock;
>>   		}
>> +
>>   		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>>   		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>>   			ret = -EINVAL;
>> @@ -1048,6 +1051,22 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>   		}
>>   	}
>>   
>> +	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
>> +	     iommu->dirty_page_tracking) {
> 
> Why do we even accept VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP when not
> dirty page tracking rather than returning -EINVAL?  It would simplify
> things here to reject it at the ioctl and silently ignoring a flag is
> rarely if ever the right approach.
> 
>> +		final_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
>> +		if (!final_bitmap) {
>> +			ret = -ENOMEM;
>> +			goto unlock;
>> +		}
>> +
>> +		temp_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
>> +		if (!temp_bitmap) {
>> +			ret = -ENOMEM;
>> +			kfree(final_bitmap);
>> +			goto unlock;
>> +		}
> 
> YIKES!  So the user can instantly trigger the kernel to internally
> allocate 2 x 256MB, regardless of how much they can actually map.
> 

That is worst case senario. I don't think ideally that will ever hit. 
More comment below regarding this.

>> +	}
>> +
>>   	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
>>   		if (!iommu->v2 && unmap->iova > dma->iova)
>>   			break;
>> @@ -1058,6 +1077,24 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>   		if (dma->task->mm != current->mm)
>>   			break;
>>   
>> +		if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
>> +		     iommu->dirty_page_tracking) {
>> +			unsigned long pgshift = __ffs(bitmap->pgsize);
>> +			unsigned int npages = dma->size >> pgshift;
>> +			unsigned int shift;
>> +
>> +			vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
>> +					bitmap->pgsize, (u64 *)temp_bitmap);
> 
> vfio_iova_dirty_bitmap() takes a __user bitmap, we're doing
> copy_to_user() on a kernel allocated buffer???
> 

Actually, there is no need to call vfio_iova_dirty_bitmap(), dma pointer 
is known here and since its getting unmapped, there is no need to 
repopulate bitmap. Removing vfio_iova_dirty_bitmap() and changing it as 
below:

if (unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
     unsigned long pgshift = __ffs(bitmap->pgsize);
     unsigned int npages = dma->size >> pgshift;
     unsigned int bitmap_size = DIRTY_BITMAP_BYTES(npages);
     unsigned int shift = (dma->iova - unmap->iova) >>
                                             pgshift;
     /*
      * mark all pages dirty if all pages are pinned and
      * mapped.
      */
     if (dma->iommu_mapped)
         bitmap_set(temp_bitmap, 0, npages);
     else
         memcpy(temp_bitmap, dma->bitmap, bitmap_size);

     if (shift)
         bitmap_shift_left(temp_bitmap, temp_bitmap,
                           shift, npages);
     bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
               shift + npages);
     memset(temp_bitmap, 0, bitmap->size);
}

>> +
>> +			shift = (dma->iova - unmap->iova) >> pgshift;
>> +			if (shift)
>> +				bitmap_shift_left(temp_bitmap, temp_bitmap,
>> +						  shift, npages);
>> +			bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
>> +				  shift + npages);
>> +			memset(temp_bitmap, 0, bitmap->size);
>> +		}
> 
> It seems like if the per vfio_dma dirty bitmap was oversized by a long
> that we could shift it in place, then we'd only need one working bitmap
> buffer and we could size that to fit the vfio_dma (or the largest
> vfio_dma if we don't want to free and re-alloc for each vfio_dma).
> We'd need to do more copy_to/from_user()s, but we'd also avoid copying
> between sparse mappings (user zero'd bitmap required) and we'd have a
> far more reasonable memory usage.  Thanks,
>

I thought about it, but couldn't optimize to use one bitmap buffer.
This case will only hit during migration with vIOMMU enabled.
Can we keep these 2 bitmap buffers for now and optimize it later?

Thanks,
Kirti


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
  2020-05-04 15:58   ` Kirti Wankhede
@ 2020-05-06 22:25     ` Alex Williamson
  -1 siblings, 0 replies; 9+ messages in thread
From: Alex Williamson @ 2020-05-06 22:25 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: cjia, kevin.tian, ziye.yang, changpeng.liu, yi.l.liu, mlevitsk,
	eskultet, cohuck, dgilbert, jonathan.davies, eauger, aik, pasic,
	felipe, Zhengxiao.zx, shuangtai.tst, Ken.Xue, zhi.a.wang,
	yan.y.zhao, qemu-devel, kvm

On Mon, 4 May 2020 21:28:57 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> DMA mapped pages, including those pinned by mdev vendor drivers, might
> get unpinned and unmapped while migration is active and device is still
> running. For example, in pre-copy phase while guest driver could access
> those pages, host device or vendor driver can dirty these mapped pages.
> Such pages should be marked dirty so as to maintain memory consistency
> for a user making use of dirty page tracking.
> 
> To get bitmap during unmap, user should allocate memory for bitmap, set
> size of allocated memory, set page size to be considered for bitmap and
> set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.
> 
> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> Reviewed-by: Neo Jia <cjia@nvidia.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 84 +++++++++++++++++++++++++++++++++++++++--
>  include/uapi/linux/vfio.h       | 10 +++++
>  2 files changed, 90 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 01dcb417836f..8b27faf1ec38 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -983,12 +983,14 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
>  }
>  
>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> -			     struct vfio_iommu_type1_dma_unmap *unmap)
> +			     struct vfio_iommu_type1_dma_unmap *unmap,
> +			     struct vfio_bitmap *bitmap)
>  {
>  	uint64_t mask;
>  	struct vfio_dma *dma, *dma_last = NULL;
>  	size_t unmapped = 0;
>  	int ret = 0, retries = 0;
> +	unsigned long *final_bitmap = NULL, *temp_bitmap = NULL;
>  
>  	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>  
> @@ -1041,6 +1043,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  			ret = -EINVAL;
>  			goto unlock;
>  		}
> +
>  		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>  		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>  			ret = -EINVAL;
> @@ -1048,6 +1051,22 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  		}
>  	}
>  
> +	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> +	     iommu->dirty_page_tracking) {

Why do we even accept VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP when not
dirty page tracking rather than returning -EINVAL?  It would simplify
things here to reject it at the ioctl and silently ignoring a flag is
rarely if ever the right approach.

> +		final_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
> +		if (!final_bitmap) {
> +			ret = -ENOMEM;
> +			goto unlock;
> +		}
> +
> +		temp_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
> +		if (!temp_bitmap) {
> +			ret = -ENOMEM;
> +			kfree(final_bitmap);
> +			goto unlock;
> +		}

YIKES!  So the user can instantly trigger the kernel to internally
allocate 2 x 256MB, regardless of how much they can actually map.

> +	}
> +
>  	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
>  		if (!iommu->v2 && unmap->iova > dma->iova)
>  			break;
> @@ -1058,6 +1077,24 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  		if (dma->task->mm != current->mm)
>  			break;
>  
> +		if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> +		     iommu->dirty_page_tracking) {
> +			unsigned long pgshift = __ffs(bitmap->pgsize);
> +			unsigned int npages = dma->size >> pgshift;
> +			unsigned int shift;
> +
> +			vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
> +					bitmap->pgsize, (u64 *)temp_bitmap);

vfio_iova_dirty_bitmap() takes a __user bitmap, we're doing
copy_to_user() on a kernel allocated buffer???

> +
> +			shift = (dma->iova - unmap->iova) >> pgshift;
> +			if (shift)
> +				bitmap_shift_left(temp_bitmap, temp_bitmap,
> +						  shift, npages);
> +			bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
> +				  shift + npages);
> +			memset(temp_bitmap, 0, bitmap->size);
> +		}

It seems like if the per vfio_dma dirty bitmap was oversized by a long
that we could shift it in place, then we'd only need one working bitmap
buffer and we could size that to fit the vfio_dma (or the largest
vfio_dma if we don't want to free and re-alloc for each vfio_dma).
We'd need to do more copy_to/from_user()s, but we'd also avoid copying
between sparse mappings (user zero'd bitmap required) and we'd have a
far more reasonable memory usage.  Thanks,

Alex

> +
>  		if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
>  			struct vfio_iommu_type1_dma_unmap nb_unmap;
>  
> @@ -1088,6 +1125,16 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  	}
>  
>  unlock:
> +	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> +	     iommu->dirty_page_tracking && final_bitmap) {
> +		if (copy_to_user((void __user *)bitmap->data, final_bitmap,
> +				 bitmap->size))
> +			ret = -EFAULT;
> +
> +		kfree(final_bitmap);
> +		kfree(temp_bitmap);
> +	}
> +
>  	mutex_unlock(&iommu->lock);
>  
>  	/* Report how much was unmapped */
> @@ -2419,17 +2466,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>  		struct vfio_iommu_type1_dma_unmap unmap;
> -		long ret;
> +		struct vfio_bitmap bitmap = { 0 };
> +		int ret;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
>  
>  		if (copy_from_user(&unmap, (void __user *)arg, minsz))
>  			return -EFAULT;
>  
> -		if (unmap.argsz < minsz || unmap.flags)
> +		if (unmap.argsz < minsz ||
> +		    unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
>  			return -EINVAL;
>  
> -		ret = vfio_dma_do_unmap(iommu, &unmap);
> +		if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
> +			unsigned long pgshift;
> +			size_t iommu_pgsize =
> +				(size_t)1 << __ffs(vfio_pgsize_bitmap(iommu));
> +
> +			if (unmap.argsz < (minsz + sizeof(bitmap)))
> +				return -EINVAL;
> +
> +			if (copy_from_user(&bitmap,
> +					   (void __user *)(arg + minsz),
> +					   sizeof(bitmap)))
> +				return -EFAULT;
> +
> +			/* allow only min supported pgsize */
> +			if (bitmap.pgsize != iommu_pgsize)
> +				return -EINVAL;
> +			if (!access_ok((void __user *)bitmap.data, bitmap.size))
> +				return -EINVAL;
> +
> +			pgshift = __ffs(bitmap.pgsize);
> +			ret = verify_bitmap_size(unmap.size >> pgshift,
> +						 bitmap.size);
> +			if (ret)
> +				return ret;
> +
> +		}
> +
> +		ret = vfio_dma_do_unmap(iommu, &unmap, &bitmap);
>  		if (ret)
>  			return ret;
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 5f359c63f5ef..e3cbf8b78623 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1048,12 +1048,22 @@ struct vfio_bitmap {
>   * field.  No guarantee is made to the user that arbitrary unmaps of iova
>   * or size different from those used in the original mapping call will
>   * succeed.
> + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap
> + * before unmapping IO virtual addresses. When this flag is set, user must
> + * provide data[] as structure vfio_bitmap. User must allocate memory to get
> + * bitmap and must set size of allocated memory in vfio_bitmap.size field.
> + * A bit in bitmap represents one page of user provided page size in 'pgsize',
> + * consecutively starting from iova offset. Bit set indicates page at that
> + * offset from iova is dirty. Bitmap of pages in the range of unmapped size is
> + * returned in vfio_bitmap.data
>   */
>  struct vfio_iommu_type1_dma_unmap {
>  	__u32	argsz;
>  	__u32	flags;
> +#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0)
>  	__u64	iova;				/* IO virtual address */
>  	__u64	size;				/* Size of mapping (bytes) */
> +	__u8    data[];
>  };
>  
>  #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
@ 2020-05-06 22:25     ` Alex Williamson
  0 siblings, 0 replies; 9+ messages in thread
From: Alex Williamson @ 2020-05-06 22:25 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: Zhengxiao.zx, kevin.tian, yi.l.liu, cjia, kvm, eskultet,
	ziye.yang, qemu-devel, cohuck, shuangtai.tst, dgilbert,
	zhi.a.wang, mlevitsk, pasic, aik, eauger, felipe,
	jonathan.davies, yan.y.zhao, changpeng.liu, Ken.Xue

On Mon, 4 May 2020 21:28:57 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> DMA mapped pages, including those pinned by mdev vendor drivers, might
> get unpinned and unmapped while migration is active and device is still
> running. For example, in pre-copy phase while guest driver could access
> those pages, host device or vendor driver can dirty these mapped pages.
> Such pages should be marked dirty so as to maintain memory consistency
> for a user making use of dirty page tracking.
> 
> To get bitmap during unmap, user should allocate memory for bitmap, set
> size of allocated memory, set page size to be considered for bitmap and
> set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.
> 
> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> Reviewed-by: Neo Jia <cjia@nvidia.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 84 +++++++++++++++++++++++++++++++++++++++--
>  include/uapi/linux/vfio.h       | 10 +++++
>  2 files changed, 90 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 01dcb417836f..8b27faf1ec38 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -983,12 +983,14 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
>  }
>  
>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> -			     struct vfio_iommu_type1_dma_unmap *unmap)
> +			     struct vfio_iommu_type1_dma_unmap *unmap,
> +			     struct vfio_bitmap *bitmap)
>  {
>  	uint64_t mask;
>  	struct vfio_dma *dma, *dma_last = NULL;
>  	size_t unmapped = 0;
>  	int ret = 0, retries = 0;
> +	unsigned long *final_bitmap = NULL, *temp_bitmap = NULL;
>  
>  	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>  
> @@ -1041,6 +1043,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  			ret = -EINVAL;
>  			goto unlock;
>  		}
> +
>  		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>  		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>  			ret = -EINVAL;
> @@ -1048,6 +1051,22 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  		}
>  	}
>  
> +	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> +	     iommu->dirty_page_tracking) {

Why do we even accept VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP when not
dirty page tracking rather than returning -EINVAL?  It would simplify
things here to reject it at the ioctl and silently ignoring a flag is
rarely if ever the right approach.

> +		final_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
> +		if (!final_bitmap) {
> +			ret = -ENOMEM;
> +			goto unlock;
> +		}
> +
> +		temp_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
> +		if (!temp_bitmap) {
> +			ret = -ENOMEM;
> +			kfree(final_bitmap);
> +			goto unlock;
> +		}

YIKES!  So the user can instantly trigger the kernel to internally
allocate 2 x 256MB, regardless of how much they can actually map.

> +	}
> +
>  	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
>  		if (!iommu->v2 && unmap->iova > dma->iova)
>  			break;
> @@ -1058,6 +1077,24 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  		if (dma->task->mm != current->mm)
>  			break;
>  
> +		if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> +		     iommu->dirty_page_tracking) {
> +			unsigned long pgshift = __ffs(bitmap->pgsize);
> +			unsigned int npages = dma->size >> pgshift;
> +			unsigned int shift;
> +
> +			vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
> +					bitmap->pgsize, (u64 *)temp_bitmap);

vfio_iova_dirty_bitmap() takes a __user bitmap, we're doing
copy_to_user() on a kernel allocated buffer???

> +
> +			shift = (dma->iova - unmap->iova) >> pgshift;
> +			if (shift)
> +				bitmap_shift_left(temp_bitmap, temp_bitmap,
> +						  shift, npages);
> +			bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
> +				  shift + npages);
> +			memset(temp_bitmap, 0, bitmap->size);
> +		}

It seems like if the per vfio_dma dirty bitmap was oversized by a long
that we could shift it in place, then we'd only need one working bitmap
buffer and we could size that to fit the vfio_dma (or the largest
vfio_dma if we don't want to free and re-alloc for each vfio_dma).
We'd need to do more copy_to/from_user()s, but we'd also avoid copying
between sparse mappings (user zero'd bitmap required) and we'd have a
far more reasonable memory usage.  Thanks,

Alex

> +
>  		if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
>  			struct vfio_iommu_type1_dma_unmap nb_unmap;
>  
> @@ -1088,6 +1125,16 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  	}
>  
>  unlock:
> +	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> +	     iommu->dirty_page_tracking && final_bitmap) {
> +		if (copy_to_user((void __user *)bitmap->data, final_bitmap,
> +				 bitmap->size))
> +			ret = -EFAULT;
> +
> +		kfree(final_bitmap);
> +		kfree(temp_bitmap);
> +	}
> +
>  	mutex_unlock(&iommu->lock);
>  
>  	/* Report how much was unmapped */
> @@ -2419,17 +2466,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>  		struct vfio_iommu_type1_dma_unmap unmap;
> -		long ret;
> +		struct vfio_bitmap bitmap = { 0 };
> +		int ret;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
>  
>  		if (copy_from_user(&unmap, (void __user *)arg, minsz))
>  			return -EFAULT;
>  
> -		if (unmap.argsz < minsz || unmap.flags)
> +		if (unmap.argsz < minsz ||
> +		    unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
>  			return -EINVAL;
>  
> -		ret = vfio_dma_do_unmap(iommu, &unmap);
> +		if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
> +			unsigned long pgshift;
> +			size_t iommu_pgsize =
> +				(size_t)1 << __ffs(vfio_pgsize_bitmap(iommu));
> +
> +			if (unmap.argsz < (minsz + sizeof(bitmap)))
> +				return -EINVAL;
> +
> +			if (copy_from_user(&bitmap,
> +					   (void __user *)(arg + minsz),
> +					   sizeof(bitmap)))
> +				return -EFAULT;
> +
> +			/* allow only min supported pgsize */
> +			if (bitmap.pgsize != iommu_pgsize)
> +				return -EINVAL;
> +			if (!access_ok((void __user *)bitmap.data, bitmap.size))
> +				return -EINVAL;
> +
> +			pgshift = __ffs(bitmap.pgsize);
> +			ret = verify_bitmap_size(unmap.size >> pgshift,
> +						 bitmap.size);
> +			if (ret)
> +				return ret;
> +
> +		}
> +
> +		ret = vfio_dma_do_unmap(iommu, &unmap, &bitmap);
>  		if (ret)
>  			return ret;
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 5f359c63f5ef..e3cbf8b78623 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1048,12 +1048,22 @@ struct vfio_bitmap {
>   * field.  No guarantee is made to the user that arbitrary unmaps of iova
>   * or size different from those used in the original mapping call will
>   * succeed.
> + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap
> + * before unmapping IO virtual addresses. When this flag is set, user must
> + * provide data[] as structure vfio_bitmap. User must allocate memory to get
> + * bitmap and must set size of allocated memory in vfio_bitmap.size field.
> + * A bit in bitmap represents one page of user provided page size in 'pgsize',
> + * consecutively starting from iova offset. Bit set indicates page at that
> + * offset from iova is dirty. Bitmap of pages in the range of unmapped size is
> + * returned in vfio_bitmap.data
>   */
>  struct vfio_iommu_type1_dma_unmap {
>  	__u32	argsz;
>  	__u32	flags;
> +#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0)
>  	__u64	iova;				/* IO virtual address */
>  	__u64	size;				/* Size of mapping (bytes) */
> +	__u8    data[];
>  };
>  
>  #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
  2020-05-04 15:58 [PATCH Kernel v18 0/7] KABIs to support migration for VFIO devices Kirti Wankhede
@ 2020-05-04 15:58   ` Kirti Wankhede
  0 siblings, 0 replies; 9+ messages in thread
From: Kirti Wankhede @ 2020-05-04 15:58 UTC (permalink / raw)
  To: alex.williamson, cjia
  Cc: kevin.tian, ziye.yang, changpeng.liu, yi.l.liu, mlevitsk,
	eskultet, cohuck, dgilbert, jonathan.davies, eauger, aik, pasic,
	felipe, Zhengxiao.zx, shuangtai.tst, Ken.Xue, zhi.a.wang,
	yan.y.zhao, qemu-devel, kvm, Kirti Wankhede

DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.

To get bitmap during unmap, user should allocate memory for bitmap, set
size of allocated memory, set page size to be considered for bitmap and
set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
---
 drivers/vfio/vfio_iommu_type1.c | 84 +++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/vfio.h       | 10 +++++
 2 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 01dcb417836f..8b27faf1ec38 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -983,12 +983,14 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
 }
 
 static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
-			     struct vfio_iommu_type1_dma_unmap *unmap)
+			     struct vfio_iommu_type1_dma_unmap *unmap,
+			     struct vfio_bitmap *bitmap)
 {
 	uint64_t mask;
 	struct vfio_dma *dma, *dma_last = NULL;
 	size_t unmapped = 0;
 	int ret = 0, retries = 0;
+	unsigned long *final_bitmap = NULL, *temp_bitmap = NULL;
 
 	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
 
@@ -1041,6 +1043,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 			ret = -EINVAL;
 			goto unlock;
 		}
+
 		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
 		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
 			ret = -EINVAL;
@@ -1048,6 +1051,22 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 		}
 	}
 
+	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+	     iommu->dirty_page_tracking) {
+		final_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
+		if (!final_bitmap) {
+			ret = -ENOMEM;
+			goto unlock;
+		}
+
+		temp_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
+		if (!temp_bitmap) {
+			ret = -ENOMEM;
+			kfree(final_bitmap);
+			goto unlock;
+		}
+	}
+
 	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
 		if (!iommu->v2 && unmap->iova > dma->iova)
 			break;
@@ -1058,6 +1077,24 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 		if (dma->task->mm != current->mm)
 			break;
 
+		if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+		     iommu->dirty_page_tracking) {
+			unsigned long pgshift = __ffs(bitmap->pgsize);
+			unsigned int npages = dma->size >> pgshift;
+			unsigned int shift;
+
+			vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
+					bitmap->pgsize, (u64 *)temp_bitmap);
+
+			shift = (dma->iova - unmap->iova) >> pgshift;
+			if (shift)
+				bitmap_shift_left(temp_bitmap, temp_bitmap,
+						  shift, npages);
+			bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
+				  shift + npages);
+			memset(temp_bitmap, 0, bitmap->size);
+		}
+
 		if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
 			struct vfio_iommu_type1_dma_unmap nb_unmap;
 
@@ -1088,6 +1125,16 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 	}
 
 unlock:
+	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+	     iommu->dirty_page_tracking && final_bitmap) {
+		if (copy_to_user((void __user *)bitmap->data, final_bitmap,
+				 bitmap->size))
+			ret = -EFAULT;
+
+		kfree(final_bitmap);
+		kfree(temp_bitmap);
+	}
+
 	mutex_unlock(&iommu->lock);
 
 	/* Report how much was unmapped */
@@ -2419,17 +2466,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
 		struct vfio_iommu_type1_dma_unmap unmap;
-		long ret;
+		struct vfio_bitmap bitmap = { 0 };
+		int ret;
 
 		minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
 
 		if (copy_from_user(&unmap, (void __user *)arg, minsz))
 			return -EFAULT;
 
-		if (unmap.argsz < minsz || unmap.flags)
+		if (unmap.argsz < minsz ||
+		    unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
 			return -EINVAL;
 
-		ret = vfio_dma_do_unmap(iommu, &unmap);
+		if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
+			unsigned long pgshift;
+			size_t iommu_pgsize =
+				(size_t)1 << __ffs(vfio_pgsize_bitmap(iommu));
+
+			if (unmap.argsz < (minsz + sizeof(bitmap)))
+				return -EINVAL;
+
+			if (copy_from_user(&bitmap,
+					   (void __user *)(arg + minsz),
+					   sizeof(bitmap)))
+				return -EFAULT;
+
+			/* allow only min supported pgsize */
+			if (bitmap.pgsize != iommu_pgsize)
+				return -EINVAL;
+			if (!access_ok((void __user *)bitmap.data, bitmap.size))
+				return -EINVAL;
+
+			pgshift = __ffs(bitmap.pgsize);
+			ret = verify_bitmap_size(unmap.size >> pgshift,
+						 bitmap.size);
+			if (ret)
+				return ret;
+
+		}
+
+		ret = vfio_dma_do_unmap(iommu, &unmap, &bitmap);
 		if (ret)
 			return ret;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5f359c63f5ef..e3cbf8b78623 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1048,12 +1048,22 @@ struct vfio_bitmap {
  * field.  No guarantee is made to the user that arbitrary unmaps of iova
  * or size different from those used in the original mapping call will
  * succeed.
+ * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap
+ * before unmapping IO virtual addresses. When this flag is set, user must
+ * provide data[] as structure vfio_bitmap. User must allocate memory to get
+ * bitmap and must set size of allocated memory in vfio_bitmap.size field.
+ * A bit in bitmap represents one page of user provided page size in 'pgsize',
+ * consecutively starting from iova offset. Bit set indicates page at that
+ * offset from iova is dirty. Bitmap of pages in the range of unmapped size is
+ * returned in vfio_bitmap.data
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
 	__u32	flags;
+#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0)
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
+	__u8    data[];
 };
 
 #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap
@ 2020-05-04 15:58   ` Kirti Wankhede
  0 siblings, 0 replies; 9+ messages in thread
From: Kirti Wankhede @ 2020-05-04 15:58 UTC (permalink / raw)
  To: alex.williamson, cjia
  Cc: Zhengxiao.zx, kevin.tian, yi.l.liu, yan.y.zhao, kvm, eskultet,
	ziye.yang, qemu-devel, cohuck, shuangtai.tst, dgilbert,
	zhi.a.wang, mlevitsk, pasic, aik, Kirti Wankhede, eauger, felipe,
	jonathan.davies, changpeng.liu, Ken.Xue

DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.

To get bitmap during unmap, user should allocate memory for bitmap, set
size of allocated memory, set page size to be considered for bitmap and
set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
---
 drivers/vfio/vfio_iommu_type1.c | 84 +++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/vfio.h       | 10 +++++
 2 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 01dcb417836f..8b27faf1ec38 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -983,12 +983,14 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
 }
 
 static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
-			     struct vfio_iommu_type1_dma_unmap *unmap)
+			     struct vfio_iommu_type1_dma_unmap *unmap,
+			     struct vfio_bitmap *bitmap)
 {
 	uint64_t mask;
 	struct vfio_dma *dma, *dma_last = NULL;
 	size_t unmapped = 0;
 	int ret = 0, retries = 0;
+	unsigned long *final_bitmap = NULL, *temp_bitmap = NULL;
 
 	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
 
@@ -1041,6 +1043,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 			ret = -EINVAL;
 			goto unlock;
 		}
+
 		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
 		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
 			ret = -EINVAL;
@@ -1048,6 +1051,22 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 		}
 	}
 
+	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+	     iommu->dirty_page_tracking) {
+		final_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
+		if (!final_bitmap) {
+			ret = -ENOMEM;
+			goto unlock;
+		}
+
+		temp_bitmap = kvzalloc(bitmap->size, GFP_KERNEL);
+		if (!temp_bitmap) {
+			ret = -ENOMEM;
+			kfree(final_bitmap);
+			goto unlock;
+		}
+	}
+
 	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
 		if (!iommu->v2 && unmap->iova > dma->iova)
 			break;
@@ -1058,6 +1077,24 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 		if (dma->task->mm != current->mm)
 			break;
 
+		if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+		     iommu->dirty_page_tracking) {
+			unsigned long pgshift = __ffs(bitmap->pgsize);
+			unsigned int npages = dma->size >> pgshift;
+			unsigned int shift;
+
+			vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
+					bitmap->pgsize, (u64 *)temp_bitmap);
+
+			shift = (dma->iova - unmap->iova) >> pgshift;
+			if (shift)
+				bitmap_shift_left(temp_bitmap, temp_bitmap,
+						  shift, npages);
+			bitmap_or(final_bitmap, final_bitmap, temp_bitmap,
+				  shift + npages);
+			memset(temp_bitmap, 0, bitmap->size);
+		}
+
 		if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
 			struct vfio_iommu_type1_dma_unmap nb_unmap;
 
@@ -1088,6 +1125,16 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 	}
 
 unlock:
+	if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+	     iommu->dirty_page_tracking && final_bitmap) {
+		if (copy_to_user((void __user *)bitmap->data, final_bitmap,
+				 bitmap->size))
+			ret = -EFAULT;
+
+		kfree(final_bitmap);
+		kfree(temp_bitmap);
+	}
+
 	mutex_unlock(&iommu->lock);
 
 	/* Report how much was unmapped */
@@ -2419,17 +2466,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
 		struct vfio_iommu_type1_dma_unmap unmap;
-		long ret;
+		struct vfio_bitmap bitmap = { 0 };
+		int ret;
 
 		minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
 
 		if (copy_from_user(&unmap, (void __user *)arg, minsz))
 			return -EFAULT;
 
-		if (unmap.argsz < minsz || unmap.flags)
+		if (unmap.argsz < minsz ||
+		    unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
 			return -EINVAL;
 
-		ret = vfio_dma_do_unmap(iommu, &unmap);
+		if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
+			unsigned long pgshift;
+			size_t iommu_pgsize =
+				(size_t)1 << __ffs(vfio_pgsize_bitmap(iommu));
+
+			if (unmap.argsz < (minsz + sizeof(bitmap)))
+				return -EINVAL;
+
+			if (copy_from_user(&bitmap,
+					   (void __user *)(arg + minsz),
+					   sizeof(bitmap)))
+				return -EFAULT;
+
+			/* allow only min supported pgsize */
+			if (bitmap.pgsize != iommu_pgsize)
+				return -EINVAL;
+			if (!access_ok((void __user *)bitmap.data, bitmap.size))
+				return -EINVAL;
+
+			pgshift = __ffs(bitmap.pgsize);
+			ret = verify_bitmap_size(unmap.size >> pgshift,
+						 bitmap.size);
+			if (ret)
+				return ret;
+
+		}
+
+		ret = vfio_dma_do_unmap(iommu, &unmap, &bitmap);
 		if (ret)
 			return ret;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5f359c63f5ef..e3cbf8b78623 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1048,12 +1048,22 @@ struct vfio_bitmap {
  * field.  No guarantee is made to the user that arbitrary unmaps of iova
  * or size different from those used in the original mapping call will
  * succeed.
+ * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap
+ * before unmapping IO virtual addresses. When this flag is set, user must
+ * provide data[] as structure vfio_bitmap. User must allocate memory to get
+ * bitmap and must set size of allocated memory in vfio_bitmap.size field.
+ * A bit in bitmap represents one page of user provided page size in 'pgsize',
+ * consecutively starting from iova offset. Bit set indicates page at that
+ * offset from iova is dirty. Bitmap of pages in the range of unmapped size is
+ * returned in vfio_bitmap.data
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
 	__u32	flags;
+#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0)
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
+	__u8    data[];
 };
 
 #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
-- 
2.7.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-05-12 21:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05 18:49 [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap kbuild test robot
  -- strict thread matches above, loose matches on Subject: below --
2020-05-04 15:58 [PATCH Kernel v18 0/7] KABIs to support migration for VFIO devices Kirti Wankhede
2020-05-04 15:58 ` [PATCH Kernel v18 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap Kirti Wankhede
2020-05-04 15:58   ` Kirti Wankhede
2020-05-06 22:25   ` Alex Williamson
2020-05-06 22:25     ` Alex Williamson
2020-05-12 20:30     ` Kirti Wankhede
2020-05-12 20:30       ` Kirti Wankhede
2020-05-12 21:21       ` Alex Williamson
2020-05-12 21:21         ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.