All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhao, Yan Y" <yan.y.zhao@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>,
	"cjia@nvidia.com" <cjia@nvidia.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Yang, Ziye" <ziye.yang@intel.com>,
	"Liu, Changpeng" <changpeng.liu@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
	"eskultet@redhat.com" <eskultet@redhat.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"jonathan.davies@nutanix.com" <jonathan.davies@nutanix.com>,
	"eauger@redhat.com" <eauger@redhat.com>,
	"aik@ozlabs.ru" <aik@ozlabs.ru>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"felipe@nutanix.com" <felipe@nutanix.com>,
	"Zhengxiao.zx@Alibaba-inc.com" <Zhengxiao.zx@Alibaba-inc.com>,
	"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
	"Ken.Xue@amd.com" <Ken.Xue@amd.com>,
	"Wang, Zhi A" <zhi.a.wang@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: RE: [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.
Date: Tue, 31 Mar 2020 02:40:24 +0000	[thread overview]
Message-ID: <F22B14EC3CFBB843AD3E03B6B78F2C6A4C4B1627@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <20200330203816.4c26ae53@x1.home>



> -----Original Message-----
> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On Behalf
> Of Alex Williamson
> Sent: Tuesday, March 31, 2020 10:38 AM
> To: Zhao, Yan Y <yan.y.zhao@intel.com>
> Cc: Kirti Wankhede <kwankhede@nvidia.com>; cjia@nvidia.com; Tian, Kevin
> <kevin.tian@intel.com>; Yang, Ziye <ziye.yang@intel.com>; Liu, Changpeng
> <changpeng.liu@intel.com>; Liu, Yi L <yi.l.liu@intel.com>;
> mlevitsk@redhat.com; eskultet@redhat.com; cohuck@redhat.com;
> dgilbert@redhat.com; jonathan.davies@nutanix.com; eauger@redhat.com;
> aik@ozlabs.ru; pasic@linux.ibm.com; felipe@nutanix.com;
> Zhengxiao.zx@Alibaba-inc.com; shuangtai.tst@alibaba-inc.com;
> Ken.Xue@amd.com; Wang, Zhi A <zhi.a.wang@intel.com>; qemu-
> devel@nongnu.org; kvm@vger.kernel.org
> Subject: Re: [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for
> dirty pages tracking.
> 
> On Mon, 30 Mar 2020 21:16:21 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Tue, Mar 31, 2020 at 09:12:59AM +0800, Alex Williamson wrote:
> > > On Mon, 30 Mar 2020 20:50:47 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >
> > > > On Tue, Mar 31, 2020 at 08:53:47AM +0800, Alex Williamson wrote:
> > > > > On Mon, 30 Mar 2020 19:51:31 -0400 Yan Zhao
> > > > > <yan.y.zhao@intel.com> wrote:
> > > > >
> > > > > > On Mon, Mar 30, 2020 at 09:49:21PM +0800, Kirti Wankhede wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 3/30/2020 8:54 AM, Yan Zhao wrote:
> > > > > > > > On Fri, Mar 27, 2020 at 01:28:13PM +0800, Kirti Wankhede wrote:
> > > > > > > >> Hit send button little early.
> > > > > > > >>
> > > > > > > >>   >
> > > > > > > >>   > I checked v12, it's not like what I said.
> > > > > > > >>   > In v12, bitmaps are generated per vfio_dma, and
> combination of the
> > > > > > > >>   > bitmaps are required in order to generate a big bitmap
> suiting for dirty
> > > > > > > >>   > query. It can cause problem when offset not aligning.
> > > > > > > >>   > But what I propose here is to generate an rb tree orthogonal
> to the tree
> > > > > > > >>   > of vfio_dma.
> > > > > > > >>   >
> > > > > > > >>   > as to CPU cycles saving, I don't think iterating/translating
> page by page
> > > > > > > >>   > would achieve that purpose.
> > > > > > > >>   >
> > > > > > > >>
> > > > > > > >> Instead of creating one extra rb tree for dirty pages
> > > > > > > >> tracking in v10 tried to use dma->pfn_list itself, we
> > > > > > > >> tried changes in v10, v11 and v12, latest version is
> > > > > > > >> evolved version with best possible approach after discussion.
> Probably, go through v11 as well.
> > > > > > > >> https://patchwork.kernel.org/patch/11298335/
> > > > > > > >>
> > > > > > > > I'm not sure why all those previous implementations are
> > > > > > > > bound to vfio_dma. for vIOMMU on, in most cases, a
> > > > > > > > vfio_dma is only for a page, so generating a one-byte bitmap for
> a single page in each vfio_dma ?
> > > > > > > > is it possible to creating one extra rb tree to keep dirty
> > > > > > > > ranges, and one fixed length kernel bitmap whose content
> > > > > > > > is generated on query, serving as a bouncing buffer for
> > > > > > > > copy_to_user
> > > > > > > >
> > > > > > >
> > > > > > > One fixed length? what should be fixed value? then isn't it
> > > > > > > better to fix the size to dma->size?
> > > > > > >
> > > > > > > This is also to prevent DoS attack, user space application
> > > > > > > can query a very large range.
> > > > > > >
> > > > > > > >>
> > > > > > > >> On 3/27/2020 6:00 AM, Yan Zhao wrote:
> > > > > > > >>> On Fri, Mar 27, 2020 at 05:39:01AM +0800, Kirti Wankhede
> wrote:
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> On 3/25/2020 7:41 AM, Yan Zhao wrote:
> > > > > > > >>>>> On Wed, Mar 25, 2020 at 05:18:52AM +0800, Kirti
> Wankhede wrote:
> > > > > > > >>>>>> VFIO_IOMMU_DIRTY_PAGES ioctl performs three
> operations:
> > > > > > > >>>>>> - Start dirty pages tracking while migration is
> > > > > > > >>>>>> active
> > > > > > > >>>>>> - Stop dirty pages tracking.
> > > > > > > >>>>>> - Get dirty pages bitmap. Its user space application's
> responsibility to
> > > > > > > >>>>>>      copy content of dirty pages from source to destination
> during migration.
> > > > > > > >>>>>>
> > > > > > > >>>>>> To prevent DoS attack, memory for bitmap is allocated
> > > > > > > >>>>>> per vfio_dma structure. Bitmap size is calculated
> > > > > > > >>>>>> considering smallest supported page size. Bitmap is
> > > > > > > >>>>>> allocated for all vfio_dmas when dirty logging is
> > > > > > > >>>>>> enabled
> > > > > > > >>>>>>
> > > > > > > >>>>>> Bitmap is populated for already pinned pages when
> > > > > > > >>>>>> bitmap is allocated for a vfio_dma with the smallest
> > > > > > > >>>>>> supported page size. Update bitmap from pinning
> > > > > > > >>>>>> functions when tracking is enabled. When user
> > > > > > > >>>>>> application queries bitmap, check if requested page
> > > > > > > >>>>>> size is same as page size used to populated bitmap. If it is
> equal, copy bitmap, but if not equal, return error.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> > > > > > > >>>>>> Reviewed-by: Neo Jia <cjia@nvidia.com>
> > > > > > > >>>>>> ---
> > > > > > > >>>>>>     drivers/vfio/vfio_iommu_type1.c | 266
> +++++++++++++++++++++++++++++++++++++++-
> > > > > > > >>>>>>     1 file changed, 260 insertions(+), 6 deletions(-)
> > > > > > > >>>>>>
> > > > > > > >>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > > > >>>>>> b/drivers/vfio/vfio_iommu_type1.c index
> > > > > > > >>>>>> 70aeab921d0f..874a1a7ae925 100644
> > > > > > > >>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > > > >>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > > > >>>>>> @@ -71,6 +71,7 @@ struct vfio_iommu {
> > > > > > > >>>>>>     	unsigned int		dma_avail;
> > > > > > > >>>>>>     	bool			v2;
> > > > > > > >>>>>>     	bool			nesting;
> > > > > > > >>>>>> +	bool			dirty_page_tracking;
> > > > > > > >>>>>>     };
> > > > > > > >>>>>>
> > > > > > > >>>>>>     struct vfio_domain { @@ -91,6 +92,7 @@ struct
> > > > > > > >>>>>> vfio_dma {
> > > > > > > >>>>>>     	bool			lock_cap;	/*
> capable(CAP_IPC_LOCK) */
> > > > > > > >>>>>>     	struct task_struct	*task;
> > > > > > > >>>>>>     	struct rb_root		pfn_list;	/* Ex-user
> pinned pfn list */
> > > > > > > >>>>>> +	unsigned long		*bitmap;
> > > > > > > >>>>>>     };
> > > > > > > >>>>>>
> > > > > > > >>>>>>     struct vfio_group { @@ -125,7 +127,21 @@ struct
> > > > > > > >>>>>> vfio_regions {
> > > > > > > >>>>>>     #define
> IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
> > > > > > > >>>>>>
> 	(!list_empty(&iommu->domain_list))
> > > > > > > >>>>>>
> > > > > > > >>>>>> +#define DIRTY_BITMAP_BYTES(n)	(ALIGN(n,
> BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
> > > > > > > >>>>>> +
> > > > > > > >>>>>> +/*
> > > > > > > >>>>>> + * Input argument of number of bits to bitmap_set()
> > > > > > > >>>>>> +is unsigned integer, which
> > > > > > > >>>>>> + * further casts to signed integer for unaligned
> > > > > > > >>>>>> +multi-bit operation,
> > > > > > > >>>>>> + * __bitmap_set().
> > > > > > > >>>>>> + * Then maximum bitmap size supported is 2^31 bits
> > > > > > > >>>>>> +divided by 2^3 bits/byte,
> > > > > > > >>>>>> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 =
> > > > > > > >>>>>> +2^43 (8TB) on 4K page
> > > > > > > >>>>>> + * system.
> > > > > > > >>>>>> + */
> > > > > > > >>>>>> +#define DIRTY_BITMAP_PAGES_MAX
> 	(uint64_t)(INT_MAX - 1)
> > > > > > > >>>>>> +#define DIRTY_BITMAP_SIZE_MAX
> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> > > > > > > >>>>>> +
> > > > > > > >>>>>>     static int put_pfn(unsigned long pfn, int prot);
> > > > > > > >>>>>> +static unsigned long vfio_pgsize_bitmap(struct
> > > > > > > >>>>>> +vfio_iommu *iommu);
> > > > > > > >>>>>>
> > > > > > > >>>>>>     /*
> > > > > > > >>>>>>      * This code handles mapping and unmapping of
> > > > > > > >>>>>> user data buffers @@ -175,6 +191,77 @@ static void
> vfio_unlink_dma(struct vfio_iommu *iommu, struct vfio_dma *old)
> > > > > > > >>>>>>     	rb_erase(&old->node, &iommu->dma_list);
> > > > > > > >>>>>>     }
> > > > > > > >>>>>>
> > > > > > > >>>>>> +
> > > > > > > >>>>>> +static int vfio_dma_bitmap_alloc(struct vfio_dma
> > > > > > > >>>>>> +*dma, uint64_t pgsize) {
> > > > > > > >>>>>> +	uint64_t npages = dma->size / pgsize;
> > > > > > > >>>>>> +
> > > > > > > > If pgsize > dma->size, npages = 0.
> > > > > > > > wouldn't it cause problem?
> > > > > > > >
> > > > > > >
> > > > > > > This patch-set supports bitmap for smallest supported page size,
> i.e.
> > > > > > > PAGE_SIZE. vfio_dma_do_map() validates dma->size
> > > > > > > accordingly. So this case will not happen.
> > > > > > >
> > > > > > as far as I know, qemu/kvm uses 4k as the unit for dirty page
> tracking.
> > > > > > so why smallest iommu page size is used here?
> > > > > > wouldn't it cause problem?
> > > > >
> > > > > If your concern is that the IOMMU supports sub-4K page sizes,
> > > > > see vfio_pgsize_bitmap().  We actually only support PAGE_SIZE as
> > > > > our minimum mapping unit, even if the IOMMU supports less, so
> PAGE_SIZE is
> > > > > our lower bound.  Thanks,
> > > >
> > > > if we always uses PAGE_SIZE, why not use PAGE_SIZE directly?
> > > > or returning dirty bitmap unit (e.g. 1 <<
> > > > __ffs(vfio_pgsize_bitmap(iommu))) to QEMU in
> > > > VFIO_IOMMU_DIRTY_PAGES_FLAG_START, so that qemu can do
> possible conversion if it's not the same unit that QEMU uses.
> > >
> > > The vfio interface is essentially just an extension of the IOMMU API
> > > via domain->pgsize_bitmap.  intel-iommu mostly made the bitmask
> > > meaningless by reporting essentially PAGE_MASK, and we just expose
> > > the common version of that across potentially all the IOMMUs used by
> > > the domain, modulo minimum of PAGE_SIZE.  Thanks,
> >
> > ok. got it. do you think it's good to return this iommu page size when
> > turning on dirty page tracking? so when GET_BITMAP ioctl comes, we
> > don't need to quit if range.bitmap.pgsize != iommu_pgsize.
> > instead, the GET_BITMAP can success with iommu page size and qemu
> does
> > the bitmap conversion afterwards.
> 
> The bitmap is already expose to the user via
> vfio_iommu_type1_info.iova_pgsizes in the VFIO_IOMMU_GET_INFO ioctl.
> Our original intention was to allow the user to specify the dirty bitmap page
> size, which is still enabled in the ioctl via bitmap.pgsize, but for simplification
> we currently only support the smallest page size.  This could be something
> else we expose via the extension interface when more page sizes are
> supported.  Thanks,
> 
Ok. Got it .  Thanks


WARNING: multiple messages have this Message-ID (diff)
From: "Zhao, Yan Y" <yan.y.zhao@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Zhengxiao.zx@Alibaba-inc.com" <Zhengxiao.zx@Alibaba-inc.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"cjia@nvidia.com" <cjia@nvidia.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"eskultet@redhat.com" <eskultet@redhat.com>,
	"Yang, Ziye" <ziye.yang@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"Wang,  Zhi A" <zhi.a.wang@intel.com>,
	"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"aik@ozlabs.ru" <aik@ozlabs.ru>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	"eauger@redhat.com" <eauger@redhat.com>,
	"felipe@nutanix.com" <felipe@nutanix.com>,
	"jonathan.davies@nutanix.com" <jonathan.davies@nutanix.com>,
	"Liu, Changpeng" <changpeng.liu@intel.com>,
	"Ken.Xue@amd.com" <Ken.Xue@amd.com>
Subject: RE: [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.
Date: Tue, 31 Mar 2020 02:40:24 +0000	[thread overview]
Message-ID: <F22B14EC3CFBB843AD3E03B6B78F2C6A4C4B1627@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <20200330203816.4c26ae53@x1.home>



> -----Original Message-----
> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On Behalf
> Of Alex Williamson
> Sent: Tuesday, March 31, 2020 10:38 AM
> To: Zhao, Yan Y <yan.y.zhao@intel.com>
> Cc: Kirti Wankhede <kwankhede@nvidia.com>; cjia@nvidia.com; Tian, Kevin
> <kevin.tian@intel.com>; Yang, Ziye <ziye.yang@intel.com>; Liu, Changpeng
> <changpeng.liu@intel.com>; Liu, Yi L <yi.l.liu@intel.com>;
> mlevitsk@redhat.com; eskultet@redhat.com; cohuck@redhat.com;
> dgilbert@redhat.com; jonathan.davies@nutanix.com; eauger@redhat.com;
> aik@ozlabs.ru; pasic@linux.ibm.com; felipe@nutanix.com;
> Zhengxiao.zx@Alibaba-inc.com; shuangtai.tst@alibaba-inc.com;
> Ken.Xue@amd.com; Wang, Zhi A <zhi.a.wang@intel.com>; qemu-
> devel@nongnu.org; kvm@vger.kernel.org
> Subject: Re: [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for
> dirty pages tracking.
> 
> On Mon, 30 Mar 2020 21:16:21 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Tue, Mar 31, 2020 at 09:12:59AM +0800, Alex Williamson wrote:
> > > On Mon, 30 Mar 2020 20:50:47 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >
> > > > On Tue, Mar 31, 2020 at 08:53:47AM +0800, Alex Williamson wrote:
> > > > > On Mon, 30 Mar 2020 19:51:31 -0400 Yan Zhao
> > > > > <yan.y.zhao@intel.com> wrote:
> > > > >
> > > > > > On Mon, Mar 30, 2020 at 09:49:21PM +0800, Kirti Wankhede wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 3/30/2020 8:54 AM, Yan Zhao wrote:
> > > > > > > > On Fri, Mar 27, 2020 at 01:28:13PM +0800, Kirti Wankhede wrote:
> > > > > > > >> Hit send button little early.
> > > > > > > >>
> > > > > > > >>   >
> > > > > > > >>   > I checked v12, it's not like what I said.
> > > > > > > >>   > In v12, bitmaps are generated per vfio_dma, and
> combination of the
> > > > > > > >>   > bitmaps are required in order to generate a big bitmap
> suiting for dirty
> > > > > > > >>   > query. It can cause problem when offset not aligning.
> > > > > > > >>   > But what I propose here is to generate an rb tree orthogonal
> to the tree
> > > > > > > >>   > of vfio_dma.
> > > > > > > >>   >
> > > > > > > >>   > as to CPU cycles saving, I don't think iterating/translating
> page by page
> > > > > > > >>   > would achieve that purpose.
> > > > > > > >>   >
> > > > > > > >>
> > > > > > > >> Instead of creating one extra rb tree for dirty pages
> > > > > > > >> tracking in v10 tried to use dma->pfn_list itself, we
> > > > > > > >> tried changes in v10, v11 and v12, latest version is
> > > > > > > >> evolved version with best possible approach after discussion.
> Probably, go through v11 as well.
> > > > > > > >> https://patchwork.kernel.org/patch/11298335/
> > > > > > > >>
> > > > > > > > I'm not sure why all those previous implementations are
> > > > > > > > bound to vfio_dma. for vIOMMU on, in most cases, a
> > > > > > > > vfio_dma is only for a page, so generating a one-byte bitmap for
> a single page in each vfio_dma ?
> > > > > > > > is it possible to creating one extra rb tree to keep dirty
> > > > > > > > ranges, and one fixed length kernel bitmap whose content
> > > > > > > > is generated on query, serving as a bouncing buffer for
> > > > > > > > copy_to_user
> > > > > > > >
> > > > > > >
> > > > > > > One fixed length? what should be fixed value? then isn't it
> > > > > > > better to fix the size to dma->size?
> > > > > > >
> > > > > > > This is also to prevent DoS attack, user space application
> > > > > > > can query a very large range.
> > > > > > >
> > > > > > > >>
> > > > > > > >> On 3/27/2020 6:00 AM, Yan Zhao wrote:
> > > > > > > >>> On Fri, Mar 27, 2020 at 05:39:01AM +0800, Kirti Wankhede
> wrote:
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> On 3/25/2020 7:41 AM, Yan Zhao wrote:
> > > > > > > >>>>> On Wed, Mar 25, 2020 at 05:18:52AM +0800, Kirti
> Wankhede wrote:
> > > > > > > >>>>>> VFIO_IOMMU_DIRTY_PAGES ioctl performs three
> operations:
> > > > > > > >>>>>> - Start dirty pages tracking while migration is
> > > > > > > >>>>>> active
> > > > > > > >>>>>> - Stop dirty pages tracking.
> > > > > > > >>>>>> - Get dirty pages bitmap. Its user space application's
> responsibility to
> > > > > > > >>>>>>      copy content of dirty pages from source to destination
> during migration.
> > > > > > > >>>>>>
> > > > > > > >>>>>> To prevent DoS attack, memory for bitmap is allocated
> > > > > > > >>>>>> per vfio_dma structure. Bitmap size is calculated
> > > > > > > >>>>>> considering smallest supported page size. Bitmap is
> > > > > > > >>>>>> allocated for all vfio_dmas when dirty logging is
> > > > > > > >>>>>> enabled
> > > > > > > >>>>>>
> > > > > > > >>>>>> Bitmap is populated for already pinned pages when
> > > > > > > >>>>>> bitmap is allocated for a vfio_dma with the smallest
> > > > > > > >>>>>> supported page size. Update bitmap from pinning
> > > > > > > >>>>>> functions when tracking is enabled. When user
> > > > > > > >>>>>> application queries bitmap, check if requested page
> > > > > > > >>>>>> size is same as page size used to populated bitmap. If it is
> equal, copy bitmap, but if not equal, return error.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> > > > > > > >>>>>> Reviewed-by: Neo Jia <cjia@nvidia.com>
> > > > > > > >>>>>> ---
> > > > > > > >>>>>>     drivers/vfio/vfio_iommu_type1.c | 266
> +++++++++++++++++++++++++++++++++++++++-
> > > > > > > >>>>>>     1 file changed, 260 insertions(+), 6 deletions(-)
> > > > > > > >>>>>>
> > > > > > > >>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > > > >>>>>> b/drivers/vfio/vfio_iommu_type1.c index
> > > > > > > >>>>>> 70aeab921d0f..874a1a7ae925 100644
> > > > > > > >>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > > > >>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > > > >>>>>> @@ -71,6 +71,7 @@ struct vfio_iommu {
> > > > > > > >>>>>>     	unsigned int		dma_avail;
> > > > > > > >>>>>>     	bool			v2;
> > > > > > > >>>>>>     	bool			nesting;
> > > > > > > >>>>>> +	bool			dirty_page_tracking;
> > > > > > > >>>>>>     };
> > > > > > > >>>>>>
> > > > > > > >>>>>>     struct vfio_domain { @@ -91,6 +92,7 @@ struct
> > > > > > > >>>>>> vfio_dma {
> > > > > > > >>>>>>     	bool			lock_cap;	/*
> capable(CAP_IPC_LOCK) */
> > > > > > > >>>>>>     	struct task_struct	*task;
> > > > > > > >>>>>>     	struct rb_root		pfn_list;	/* Ex-user
> pinned pfn list */
> > > > > > > >>>>>> +	unsigned long		*bitmap;
> > > > > > > >>>>>>     };
> > > > > > > >>>>>>
> > > > > > > >>>>>>     struct vfio_group { @@ -125,7 +127,21 @@ struct
> > > > > > > >>>>>> vfio_regions {
> > > > > > > >>>>>>     #define
> IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
> > > > > > > >>>>>>
> 	(!list_empty(&iommu->domain_list))
> > > > > > > >>>>>>
> > > > > > > >>>>>> +#define DIRTY_BITMAP_BYTES(n)	(ALIGN(n,
> BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
> > > > > > > >>>>>> +
> > > > > > > >>>>>> +/*
> > > > > > > >>>>>> + * Input argument of number of bits to bitmap_set()
> > > > > > > >>>>>> +is unsigned integer, which
> > > > > > > >>>>>> + * further casts to signed integer for unaligned
> > > > > > > >>>>>> +multi-bit operation,
> > > > > > > >>>>>> + * __bitmap_set().
> > > > > > > >>>>>> + * Then maximum bitmap size supported is 2^31 bits
> > > > > > > >>>>>> +divided by 2^3 bits/byte,
> > > > > > > >>>>>> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 =
> > > > > > > >>>>>> +2^43 (8TB) on 4K page
> > > > > > > >>>>>> + * system.
> > > > > > > >>>>>> + */
> > > > > > > >>>>>> +#define DIRTY_BITMAP_PAGES_MAX
> 	(uint64_t)(INT_MAX - 1)
> > > > > > > >>>>>> +#define DIRTY_BITMAP_SIZE_MAX
> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> > > > > > > >>>>>> +
> > > > > > > >>>>>>     static int put_pfn(unsigned long pfn, int prot);
> > > > > > > >>>>>> +static unsigned long vfio_pgsize_bitmap(struct
> > > > > > > >>>>>> +vfio_iommu *iommu);
> > > > > > > >>>>>>
> > > > > > > >>>>>>     /*
> > > > > > > >>>>>>      * This code handles mapping and unmapping of
> > > > > > > >>>>>> user data buffers @@ -175,6 +191,77 @@ static void
> vfio_unlink_dma(struct vfio_iommu *iommu, struct vfio_dma *old)
> > > > > > > >>>>>>     	rb_erase(&old->node, &iommu->dma_list);
> > > > > > > >>>>>>     }
> > > > > > > >>>>>>
> > > > > > > >>>>>> +
> > > > > > > >>>>>> +static int vfio_dma_bitmap_alloc(struct vfio_dma
> > > > > > > >>>>>> +*dma, uint64_t pgsize) {
> > > > > > > >>>>>> +	uint64_t npages = dma->size / pgsize;
> > > > > > > >>>>>> +
> > > > > > > > If pgsize > dma->size, npages = 0.
> > > > > > > > wouldn't it cause problem?
> > > > > > > >
> > > > > > >
> > > > > > > This patch-set supports bitmap for smallest supported page size,
> i.e.
> > > > > > > PAGE_SIZE. vfio_dma_do_map() validates dma->size
> > > > > > > accordingly. So this case will not happen.
> > > > > > >
> > > > > > as far as I know, qemu/kvm uses 4k as the unit for dirty page
> tracking.
> > > > > > so why smallest iommu page size is used here?
> > > > > > wouldn't it cause problem?
> > > > >
> > > > > If your concern is that the IOMMU supports sub-4K page sizes,
> > > > > see vfio_pgsize_bitmap().  We actually only support PAGE_SIZE as
> > > > > our minimum mapping unit, even if the IOMMU supports less, so
> PAGE_SIZE is
> > > > > our lower bound.  Thanks,
> > > >
> > > > if we always uses PAGE_SIZE, why not use PAGE_SIZE directly?
> > > > or returning dirty bitmap unit (e.g. 1 <<
> > > > __ffs(vfio_pgsize_bitmap(iommu))) to QEMU in
> > > > VFIO_IOMMU_DIRTY_PAGES_FLAG_START, so that qemu can do
> possible conversion if it's not the same unit that QEMU uses.
> > >
> > > The vfio interface is essentially just an extension of the IOMMU API
> > > via domain->pgsize_bitmap.  intel-iommu mostly made the bitmask
> > > meaningless by reporting essentially PAGE_MASK, and we just expose
> > > the common version of that across potentially all the IOMMUs used by
> > > the domain, modulo minimum of PAGE_SIZE.  Thanks,
> >
> > ok. got it. do you think it's good to return this iommu page size when
> > turning on dirty page tracking? so when GET_BITMAP ioctl comes, we
> > don't need to quit if range.bitmap.pgsize != iommu_pgsize.
> > instead, the GET_BITMAP can success with iommu page size and qemu
> does
> > the bitmap conversion afterwards.
> 
> The bitmap is already expose to the user via
> vfio_iommu_type1_info.iova_pgsizes in the VFIO_IOMMU_GET_INFO ioctl.
> Our original intention was to allow the user to specify the dirty bitmap page
> size, which is still enabled in the ioctl via bitmap.pgsize, but for simplification
> we currently only support the smallest page size.  This could be something
> else we expose via the extension interface when more page sizes are
> supported.  Thanks,
> 
Ok. Got it .  Thanks



  reply	other threads:[~2020-03-31  2:40 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24 21:18 [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking Kirti Wankhede
2020-03-24 21:18 ` Kirti Wankhede
2020-03-25  2:11 ` Yan Zhao
2020-03-25  2:11   ` Yan Zhao
2020-03-26 21:39   ` Kirti Wankhede
2020-03-26 21:39     ` Kirti Wankhede
2020-03-27  0:30     ` Yan Zhao
2020-03-27  0:30       ` Yan Zhao
2020-03-27  5:07       ` Kirti Wankhede
2020-03-27  5:07         ` Kirti Wankhede
2020-03-30  2:07         ` Yan Zhao
2020-03-30  2:07           ` Yan Zhao
2020-03-30 20:47           ` Alex Williamson
2020-03-30 20:47             ` Alex Williamson
2020-03-30 23:49             ` Yan Zhao
2020-03-30 23:49               ` Yan Zhao
2020-03-27  5:28       ` Kirti Wankhede
2020-03-27  5:28         ` Kirti Wankhede
2020-03-30  3:24         ` Yan Zhao
2020-03-30  3:24           ` Yan Zhao
2020-03-30 13:49           ` Kirti Wankhede
2020-03-30 13:49             ` Kirti Wankhede
2020-03-30 23:51             ` Yan Zhao
2020-03-30 23:51               ` Yan Zhao
2020-03-31  0:53               ` Alex Williamson
2020-03-31  0:53                 ` Alex Williamson
2020-03-31  0:50                 ` Yan Zhao
2020-03-31  0:50                   ` Yan Zhao
2020-03-31  1:12                   ` Alex Williamson
2020-03-31  1:12                     ` Alex Williamson
2020-03-31  1:16                     ` Yan Zhao
2020-03-31  1:16                       ` Yan Zhao
2020-03-31  2:38                       ` Alex Williamson
2020-03-31  2:38                         ` Alex Williamson
2020-03-31  2:40                         ` Zhao, Yan Y [this message]
2020-03-31  2:40                           ` Zhao, Yan Y
2020-03-27 11:57 ` Dr. David Alan Gilbert
2020-03-27 11:57   ` Dr. David Alan Gilbert
2020-03-27 13:57   ` Alex Williamson
2020-03-27 13:57     ` Alex Williamson
2020-03-27 14:09     ` Dr. David Alan Gilbert
2020-03-27 14:09       ` Dr. David Alan Gilbert
  -- strict thread matches above, loose matches on Subject: below --
2020-03-24 19:32 [PATCH v16 Kernel 0/7] KABIs to support migration for VFIO devices Kirti Wankhede
2020-03-24 19:32 ` [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking Kirti Wankhede
2020-03-24 19:32   ` Kirti Wankhede
2020-03-24 20:37   ` Alex Williamson
2020-03-24 20:37     ` Alex Williamson
2020-03-24 20:45     ` Alex Williamson
2020-03-24 20:45       ` Alex Williamson
2020-03-24 21:48       ` Kirti Wankhede
2020-03-24 21:48         ` Kirti Wankhede
2020-03-26  8:17   ` kbuild test robot
2020-03-26 18:47   ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F22B14EC3CFBB843AD3E03B6B78F2C6A4C4B1627@shsmsx102.ccr.corp.intel.com \
    --to=yan.y.zhao@intel.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@Alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.