All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirti Wankhede <kwankhede@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: <pbonzini@redhat.com>, <kraxel@redhat.com>, <cjia@nvidia.com>,
	<qemu-devel@nongnu.org>, <kvm@vger.kernel.org>,
	<kevin.tian@intel.com>, <jike.song@intel.com>,
	<bjsdjshi@linux.vnet.ibm.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v12 10/22] vfio iommu type1: Add support for mediated devices
Date: Tue, 15 Nov 2016 20:39:25 +0530	[thread overview]
Message-ID: <bc5b3724-6a82-15db-1a72-a42e74403c6f@nvidia.com> (raw)
In-Reply-To: <20161114162547.734a2a16@t450s.home>



On 11/15/2016 4:55 AM, Alex Williamson wrote:
> On Mon, 14 Nov 2016 21:12:24 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
>> Mediated device only uses IOMMU APIs, the underlying hardware can be
>> managed by an IOMMU domain.
>>
>> Aim of this change is:
>> - To use most of the code of TYPE1 IOMMU driver for mediated devices
>> - To support direct assigned device and mediated device in single module
>>
>> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
>> backend module. More details:
>> - Domain for external user is tracked separately in vfio_iommu structure.
>>   It is allocated when group for first mdev device is attached.
>> - Pages pinned for external domain are tracked in each vfio_dma structure
>>   for that iova range.
>> - Page tracking rb-tree in vfio_dma keeps <iova, pfn, ref_count>. Key of
>>   rb-tree is iova, but it actually aims to track pfns.
>> - On external pin request for an iova, page is pinned only once, if iova
>>   is already pinned and tracked, ref_count is incremented.
> 
> This is referring only to the external (ie. pfn_list) tracking only,
> correct?  In general, a page can be pinned up to twice per iova
> referencing it, once for an iommu mapped domain and again in the
> pfn_list, right?
> 

That's right.

...
  		}
>>  
>> -		if (!rsvd && !lock_cap && mm->locked_vm + i + 1 > limit) {
>> -			put_pfn(pfn, prot);
>> +		iova = vaddr - dma->vaddr + dma->iova;
> 
> 
> nit, this could be incremented in the for loop just like vaddr, we
> don't need to create it from scratch (iova += PAGE_SIZE).
> 

Ok.

> 
>> +		vpfn = vfio_find_vpfn(dma, iova);
>> +		if (!vpfn)
>> +			lock_acct++;
>> +
>> +		if (!rsvd && !lock_cap &&
>> +		    mm->locked_vm + lock_acct + 1 > limit) {
> 
> 
> lock_acct was incremented just above, why is this lock_acct + 1?  I
> think we're off by one here (ie. remove the +1)?
> 

Moved this vfio_find_vpfn() check after this check.


...


>> -static long vfio_unpin_pages_remote(struct vfio_dma *dma, unsigned long pfn,
>> -				    long npage, int prot, bool do_accounting)
>> +static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
>> +				    unsigned long pfn, long npage,
>> +				    bool do_accounting)
>>  {
>> -	unsigned long unlocked = 0;
>> +	long unlocked = 0, locked = 0;
>>  	long i;
>>  
>> -	for (i = 0; i < npage; i++)
>> -		unlocked += put_pfn(pfn++, prot);
>> +	for (i = 0; i < npage; i++) {
>> +		struct vfio_pfn *vpfn;
>> +
>> +		unlocked += put_pfn(pfn++, dma->prot);
>> +
>> +		vpfn = vfio_find_vpfn(dma, iova + (i << PAGE_SHIFT));
>> +		if (vpfn)
>> +			locked++;
> 
> This isn't taking into account reserved pages (ex. device mmaps).  In
> the pinning path above we skip accounting of reserved pages, put_pfn
> also only increments for non-reserved pages, but vfio_find_vpfn()
> doesn't care, so it's possible that (locked - unlocked) could result in
> a positive value.  Maybe something like:
> 
> if (put_pfn(...)) {
> 	unlocked++;
> 	if (vfio_find_vpfn(...))
> 		locked++;
> }
> 

Updating.

...
>>  
>> -static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> +static int vfio_iommu_type1_pin_pages(void *iommu_data,
>> +				      unsigned long *user_pfn,
>> +				      int npage, int prot,
>> +				      unsigned long *phys_pfn)
>> +{
>> +	struct vfio_iommu *iommu = iommu_data;
>> +	int i, j, ret;
>> +	unsigned long remote_vaddr;
>> +	unsigned long *pfn = phys_pfn;
> 
> nit, why do we have this variable rather than using phys_pfn directly?
> Maybe before we had unwind support we incremented this rather than
> indexing it?
> 

Updating.

...

>> +		iova = user_pfn[i] << PAGE_SHIFT;
>> +		dma = vfio_find_dma(iommu, iova, 0);
>> +		if (!dma)
>> +			goto unpin_exit;
>> +		unlocked += vfio_unpin_page_external(dma, iova, do_accounting);
>> +	}
>> +
>> +unpin_exit:
>> +	mutex_unlock(&iommu->lock);
>> +	return unlocked;
> 
> This is not returning what it's supposed to.  unlocked is going to be
> less than or equal to the number of pages unpinned.  We don't need to
> track unlocked, I think we're just tracking where we are in the unpin
> array, whether it was partial or complete.  I think we want:
> 
> return i > npage ? npage : i;
> 
> Or maybe we can make it more obvious if there's an error on the first
> unpin entry:
> 
> return i > npage ? npage : (i > 0 ? i : -EINVAL);
> 

Thanks for pointing that out. Updating.

...
>> +static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu)
>> +{
>> +	struct rb_node *n, *p;
>> +
>> +	n = rb_first(&iommu->dma_list);
>> +	for (; n; n = rb_next(n)) {
>> +		struct vfio_dma *dma;
>> +		long locked = 0, unlocked = 0;
>> +
>> +		dma = rb_entry(n, struct vfio_dma, node);
>> +		unlocked += vfio_unmap_unpin(iommu, dma, false);
>> +		p = rb_first(&dma->pfn_list);
>> +		for (; p; p = rb_next(p))
>> +			locked++;
> 
> We don't know that these weren't reserved pages.  If the vendor driver
> was pinning peer-to-peer ranges vfio_unmap_unpin() might have returned
> 0 yet we're assuming each is locked RAM, so our accounting can go
> upside down.
> 

Adding if rsvd check here.

>> +		vfio_lock_acct(dma->task, locked - unlocked);
>> +	}
>> +}
>> +
>> +static void vfio_external_unpin_all(struct vfio_iommu *iommu,
>> +				    bool do_accounting)
>> +{
>> +	struct rb_node *n, *p;
>> +
>> +	n = rb_first(&iommu->dma_list);
>> +	for (; n; n = rb_next(n)) {
>> +		struct vfio_dma *dma;
>> +
>> +		dma = rb_entry(n, struct vfio_dma, node);
>> +		while ((p = rb_first(&dma->pfn_list))) {
>> +			int unlocked;
>> +			struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
>> +							 node);
>> +
>> +			unlocked = vfio_iova_put_vfio_pfn(dma, vpfn);
>> +			if (do_accounting)
>> +				vfio_lock_acct(dma->task, -unlocked);
> 
> nit, we could batch these further, only updating accounting once per
> vfio_dma, or once entirely.
> 

Ok.

Thanks,
Kirti

WARNING: multiple messages have this Message-ID (diff)
From: Kirti Wankhede <kwankhede@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com,
	qemu-devel@nongnu.org, kvm@vger.kernel.org, kevin.tian@intel.com,
	jike.song@intel.com, bjsdjshi@linux.vnet.ibm.com,
	linux-kernel@vger.kernel.org
Subject: Re: [Qemu-devel] [PATCH v12 10/22] vfio iommu type1: Add support for mediated devices
Date: Tue, 15 Nov 2016 20:39:25 +0530	[thread overview]
Message-ID: <bc5b3724-6a82-15db-1a72-a42e74403c6f@nvidia.com> (raw)
In-Reply-To: <20161114162547.734a2a16@t450s.home>



On 11/15/2016 4:55 AM, Alex Williamson wrote:
> On Mon, 14 Nov 2016 21:12:24 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
>> Mediated device only uses IOMMU APIs, the underlying hardware can be
>> managed by an IOMMU domain.
>>
>> Aim of this change is:
>> - To use most of the code of TYPE1 IOMMU driver for mediated devices
>> - To support direct assigned device and mediated device in single module
>>
>> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
>> backend module. More details:
>> - Domain for external user is tracked separately in vfio_iommu structure.
>>   It is allocated when group for first mdev device is attached.
>> - Pages pinned for external domain are tracked in each vfio_dma structure
>>   for that iova range.
>> - Page tracking rb-tree in vfio_dma keeps <iova, pfn, ref_count>. Key of
>>   rb-tree is iova, but it actually aims to track pfns.
>> - On external pin request for an iova, page is pinned only once, if iova
>>   is already pinned and tracked, ref_count is incremented.
> 
> This is referring only to the external (ie. pfn_list) tracking only,
> correct?  In general, a page can be pinned up to twice per iova
> referencing it, once for an iommu mapped domain and again in the
> pfn_list, right?
> 

That's right.

...
  		}
>>  
>> -		if (!rsvd && !lock_cap && mm->locked_vm + i + 1 > limit) {
>> -			put_pfn(pfn, prot);
>> +		iova = vaddr - dma->vaddr + dma->iova;
> 
> 
> nit, this could be incremented in the for loop just like vaddr, we
> don't need to create it from scratch (iova += PAGE_SIZE).
> 

Ok.

> 
>> +		vpfn = vfio_find_vpfn(dma, iova);
>> +		if (!vpfn)
>> +			lock_acct++;
>> +
>> +		if (!rsvd && !lock_cap &&
>> +		    mm->locked_vm + lock_acct + 1 > limit) {
> 
> 
> lock_acct was incremented just above, why is this lock_acct + 1?  I
> think we're off by one here (ie. remove the +1)?
> 

Moved this vfio_find_vpfn() check after this check.


...


>> -static long vfio_unpin_pages_remote(struct vfio_dma *dma, unsigned long pfn,
>> -				    long npage, int prot, bool do_accounting)
>> +static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
>> +				    unsigned long pfn, long npage,
>> +				    bool do_accounting)
>>  {
>> -	unsigned long unlocked = 0;
>> +	long unlocked = 0, locked = 0;
>>  	long i;
>>  
>> -	for (i = 0; i < npage; i++)
>> -		unlocked += put_pfn(pfn++, prot);
>> +	for (i = 0; i < npage; i++) {
>> +		struct vfio_pfn *vpfn;
>> +
>> +		unlocked += put_pfn(pfn++, dma->prot);
>> +
>> +		vpfn = vfio_find_vpfn(dma, iova + (i << PAGE_SHIFT));
>> +		if (vpfn)
>> +			locked++;
> 
> This isn't taking into account reserved pages (ex. device mmaps).  In
> the pinning path above we skip accounting of reserved pages, put_pfn
> also only increments for non-reserved pages, but vfio_find_vpfn()
> doesn't care, so it's possible that (locked - unlocked) could result in
> a positive value.  Maybe something like:
> 
> if (put_pfn(...)) {
> 	unlocked++;
> 	if (vfio_find_vpfn(...))
> 		locked++;
> }
> 

Updating.

...
>>  
>> -static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> +static int vfio_iommu_type1_pin_pages(void *iommu_data,
>> +				      unsigned long *user_pfn,
>> +				      int npage, int prot,
>> +				      unsigned long *phys_pfn)
>> +{
>> +	struct vfio_iommu *iommu = iommu_data;
>> +	int i, j, ret;
>> +	unsigned long remote_vaddr;
>> +	unsigned long *pfn = phys_pfn;
> 
> nit, why do we have this variable rather than using phys_pfn directly?
> Maybe before we had unwind support we incremented this rather than
> indexing it?
> 

Updating.

...

>> +		iova = user_pfn[i] << PAGE_SHIFT;
>> +		dma = vfio_find_dma(iommu, iova, 0);
>> +		if (!dma)
>> +			goto unpin_exit;
>> +		unlocked += vfio_unpin_page_external(dma, iova, do_accounting);
>> +	}
>> +
>> +unpin_exit:
>> +	mutex_unlock(&iommu->lock);
>> +	return unlocked;
> 
> This is not returning what it's supposed to.  unlocked is going to be
> less than or equal to the number of pages unpinned.  We don't need to
> track unlocked, I think we're just tracking where we are in the unpin
> array, whether it was partial or complete.  I think we want:
> 
> return i > npage ? npage : i;
> 
> Or maybe we can make it more obvious if there's an error on the first
> unpin entry:
> 
> return i > npage ? npage : (i > 0 ? i : -EINVAL);
> 

Thanks for pointing that out. Updating.

...
>> +static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu)
>> +{
>> +	struct rb_node *n, *p;
>> +
>> +	n = rb_first(&iommu->dma_list);
>> +	for (; n; n = rb_next(n)) {
>> +		struct vfio_dma *dma;
>> +		long locked = 0, unlocked = 0;
>> +
>> +		dma = rb_entry(n, struct vfio_dma, node);
>> +		unlocked += vfio_unmap_unpin(iommu, dma, false);
>> +		p = rb_first(&dma->pfn_list);
>> +		for (; p; p = rb_next(p))
>> +			locked++;
> 
> We don't know that these weren't reserved pages.  If the vendor driver
> was pinning peer-to-peer ranges vfio_unmap_unpin() might have returned
> 0 yet we're assuming each is locked RAM, so our accounting can go
> upside down.
> 

Adding if rsvd check here.

>> +		vfio_lock_acct(dma->task, locked - unlocked);
>> +	}
>> +}
>> +
>> +static void vfio_external_unpin_all(struct vfio_iommu *iommu,
>> +				    bool do_accounting)
>> +{
>> +	struct rb_node *n, *p;
>> +
>> +	n = rb_first(&iommu->dma_list);
>> +	for (; n; n = rb_next(n)) {
>> +		struct vfio_dma *dma;
>> +
>> +		dma = rb_entry(n, struct vfio_dma, node);
>> +		while ((p = rb_first(&dma->pfn_list))) {
>> +			int unlocked;
>> +			struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
>> +							 node);
>> +
>> +			unlocked = vfio_iova_put_vfio_pfn(dma, vpfn);
>> +			if (do_accounting)
>> +				vfio_lock_acct(dma->task, -unlocked);
> 
> nit, we could batch these further, only updating accounting once per
> vfio_dma, or once entirely.
> 

Ok.

Thanks,
Kirti

  reply	other threads:[~2016-11-15 15:11 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-14 15:42 [PATCH v12 00/22] Add Mediated device support Kirti Wankhede
2016-11-14 15:42 ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 01/22] vfio: Mediated device Core driver Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-15  8:30   ` Dong Jia Shi
2016-11-15  8:30   ` [Qemu-devel] " Dong Jia Shi
2016-11-15 15:16     ` Kirti Wankhede
2016-11-15 15:16       ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 02/22] vfio: VFIO based driver for Mediated devices Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 03/22] vfio: Rearrange functions to get vfio_group from dev Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 04/22] vfio: Common function to increment container_users Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 05/22] vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 19:43   ` Alex Williamson
2016-11-14 19:43     ` [Qemu-devel] " Alex Williamson
2016-11-15 15:03     ` Kirti Wankhede
2016-11-15 15:03       ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 06/22] vfio iommu type1: Update arguments of vfio_lock_acct Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42   ` Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 07/22] vfio iommu type1: Update argument of vaddr_get_pfn() Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 08/22] vfio iommu type1: Add find_iommu_group() function Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 09/22] vfio iommu type1: Add task structure to vfio_dma Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-15  5:26   ` Alexey Kardashevskiy
2016-11-15  5:26     ` [Qemu-devel] " Alexey Kardashevskiy
2016-11-14 15:42 ` [PATCH v12 10/22] vfio iommu type1: Add support for mediated devices Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 23:25   ` Alex Williamson
2016-11-14 23:25     ` [Qemu-devel] " Alex Williamson
2016-11-15 15:09     ` Kirti Wankhede [this message]
2016-11-15 15:09       ` Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 23:58   ` Alex Williamson
2016-11-14 23:58     ` [Qemu-devel] " Alex Williamson
2016-11-15 15:11     ` Kirti Wankhede
2016-11-15 15:11       ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 12/22] vfio: Add notifier callback to parent's ops structure of mdev Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-15  6:45   ` Jike Song
2016-11-15  6:45     ` [Qemu-devel] " Jike Song
2016-11-15  8:11     ` Kirti Wankhede
2016-11-15  8:11       ` [Qemu-devel] " Kirti Wankhede
2016-11-15  9:30       ` Jike Song
2016-11-15  9:30         ` [Qemu-devel] " Jike Song
2016-11-15 15:19     ` Alex Williamson
2016-11-15 15:19       ` [Qemu-devel] " Alex Williamson
2016-11-16  5:52       ` Jike Song
2016-11-16  5:52         ` [Qemu-devel] " Jike Song
2016-11-14 15:42 ` [PATCH v12 13/22] vfio: Introduce common function to add capabilities Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 14/22] vfio_pci: Update vfio_pci to use vfio_info_add_capability() Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 15/22] vfio: Introduce vfio_set_irqs_validate_and_prepare() Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 16/22] vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare() Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 17/22] vfio_platform: " Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 18/22] vfio: Define device_api strings Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 19/22] docs: Add Documentation for Mediated devices Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 20/22] docs: Sysfs ABI for mediated device framework Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 21/22] docs: Sample driver to demonstrate how to use Mediated " Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede
2016-11-14 15:42 ` [PATCH v12 22/22] MAINTAINERS: Add entry VFIO based Mediated device drivers Kirti Wankhede
2016-11-14 15:42   ` [Qemu-devel] " Kirti Wankhede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc5b3724-6a82-15db-1a72-a42e74403c6f@nvidia.com \
    --to=kwankhede@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=bjsdjshi@linux.vnet.ibm.com \
    --cc=cjia@nvidia.com \
    --cc=jike.song@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.