From: Kirti Wankhede <kwankhede@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: <pbonzini@redhat.com>, <kraxel@redhat.com>, <cjia@nvidia.com>,
<qemu-devel@nongnu.org>, <kvm@vger.kernel.org>,
<kevin.tian@intel.com>, <jike.song@intel.com>,
<bjsdjshi@linux.vnet.ibm.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices
Date: Tue, 8 Nov 2016 20:36:34 +0530 [thread overview]
Message-ID: <f99b668b-e938-9fde-d166-cacfc7b07a53@nvidia.com> (raw)
In-Reply-To: <20161107161619.66e03d8f@t450s.home>
On 11/8/2016 4:46 AM, Alex Williamson wrote:
> On Sat, 5 Nov 2016 02:40:44 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>
...
>> -static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> +static int __vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
>> + int prot, unsigned long *pfn_base,
>> + bool do_accounting)
>> +{
>> + struct task_struct *task = dma->task;
>> + unsigned long limit = task_rlimit(task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>> + bool lock_cap = dma->mlock_cap;
>> + struct mm_struct *mm = dma->addr_space->mm;
>> + int ret;
>> + bool rsvd;
>> +
>> + ret = vaddr_get_pfn(mm, vaddr, prot, pfn_base);
>> + if (ret)
>> + return ret;
>> +
>> + rsvd = is_invalid_reserved_pfn(*pfn_base);
>> +
>> + if (!rsvd && !lock_cap && mm->locked_vm + 1 > limit) {
>> + put_pfn(*pfn_base, prot);
>> + pr_warn("%s: Task %s (%d) RLIMIT_MEMLOCK (%ld) exceeded\n",
>> + __func__, task->comm, task_pid_nr(task),
>> + limit << PAGE_SHIFT);
>> + return -ENOMEM;
>> + }
>> +
>> + if (!rsvd && do_accounting)
>> + vfio_lock_acct(mm, 1);
>> +
>> + return 1;
>> +}
>> +
>> +static void __vfio_unpin_page_external(struct vfio_addr_space *addr_space,
>> + unsigned long pfn, int prot,
>> + bool do_accounting)
>> +{
>> + put_pfn(pfn, prot);
>> +
>> + if (do_accounting)
>> + vfio_lock_acct(addr_space->mm, -1);
>
> Can't we batch this like we do elsewhere? Intel folks, AIUI you intend
> to pin all VM memory through this side channel, have you tested the
> scalability and performance of this with larger VMs? Our vfio_pfn
> data structure alone is 40 bytes per pinned page, which means for
> each 1GB of VM memory, we have 10MBs worth of struct vfio_pfn!
> Additionally, unmapping each 1GB of VM memory will result in 256k
> separate vfio_lock_acct() callbacks. I'm concerned that we're not
> being efficient enough in either space or time.
>
> One thought might be whether we really need to save the pfn, we better
> always get the same result if we pin it again, or maybe we can just do
> a lookup through the mm at that point without re-pinning. Could we get
> to the point where we only need an atomic_t ref count per page in a
> linear array relative to the IOVA?
Ok. Is System RAM hot-plug supported? How is system RAM hot-plug
handled? Are there DMA_MAP calls on such hot-plug for additional range?
If we have a linear array/memory, we will have to realloc it on memory
hot-plug?
> That would give us 1MB per 1GB
> overhead. The semantics of the pin and unpin would make more sense then
> too, both would take an IOVA range, only pinning would need a return
> mechanism. For instance:
>
> int pin_pages(void *iommu_data, dma_addr_t iova_base,
> int npage, unsigned long *pfn_base);
>
> This would pin physically contiguous pages up to npage, returning the
> base pfn and returning the number of pages pinned (<= npage). The
> vendor driver would make multiple calls to fill the necessary range.
With the current patch, input is user_pfn[] array and npages.
int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
int npage, int prot, unsigned long *phys_pfn)
When guest allocates memory with malloc(), gfns would not be contiguous,
right? These gfns (user_pfns) are passed as argument here.
Is there any case where we could get pin/unpin request for contiguous pages?
> Unpin would then simply be:
>
> void unpin_pages(void *iommu_data, dma_addr_t iova_base, int npage);
>
> Hugepage usage would really make such an interface shine (ie. 2MB+
> contiguous ranges). A downside would be the overhead of getting the
> group and container reference in vfio for each callback, perhaps we'd
> need to figure out how the vendor driver could hold that reference.
In very initial phases of proposal, I had suggested to keep pointer to
container->iommu_data in struct mdev_device. But that was discarded.
> The current API of passing around pfn arrays, further increasing the
> overhead of the whole ecosystem just makes me cringe though.
>
...
>> + if (ret <= 0) {
>> + WARN_ON(!ret);
>> + goto pin_unwind;
>> + }
>> +
>> + mutex_lock(&dma->addr_space->pfn_list_lock);
>> +
>> + /* search if pfn exist */
>> + p = vfio_find_pfn(dma->addr_space, pfn[i]);
>> + if (p) {
>> + atomic_inc(&p->ref_count);
>
> We never test whether (p->prot == prot), shouldn't we be doing
> something in that case? In fact, why do we allow the side-channel
> through the .{un}pin_pages to specify page protection flags that might
> be different than the user specified for the DMA_MAP? If the user
> specified read-only, the vendor driver should not be allowed to
> override with read-write access.
>
If user specified protection flags for DMA_MAP could be
prot = IOMMU_WRITE | IOMMU_READ;
But vendor driver can request to pin page to be readonly, i.e.
IOMMU_READ. In that case, pin pages should be allowed, right?
Then the check should be if (p->prot & prot).
Thanks,
Kirti
next prev parent reply other threads:[~2016-11-08 15:06 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-04 21:10 [PATCH v11 00/22] Add Mediated device support Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 01/22] vfio: Mediated device Core driver Kirti Wankhede
2016-11-07 6:40 ` Tian, Kevin
[not found] ` <20161108092552.GA2090@bjsdjshi@linux.vnet.ibm.com>
2016-11-08 21:06 ` Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 02/22] vfio: VFIO based driver for Mediated devices Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 03/22] vfio: Rearrange functions to get vfio_group from dev Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 04/22] vfio: Common function to increment container_users Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 05/22] vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops Kirti Wankhede
2016-11-07 19:36 ` Alex Williamson
2016-11-08 13:55 ` Kirti Wankhede
2016-11-08 16:39 ` Alex Williamson
2016-11-08 18:47 ` Kirti Wankhede
2016-11-08 19:14 ` Alex Williamson
2016-11-04 21:10 ` [PATCH v11 06/22] vfio iommu type1: Update arguments of vfio_lock_acct Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 07/22] vfio iommu type1: Update argument of vaddr_get_pfn() Kirti Wankhede
2016-11-07 8:42 ` Alexey Kardashevskiy
2016-11-04 21:10 ` [PATCH v11 08/22] vfio iommu type1: Add find_iommu_group() function Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 09/22] vfio iommu type1: Add task structure to vfio_dma Kirti Wankhede
2016-11-07 21:03 ` Alex Williamson
2016-11-08 14:13 ` Kirti Wankhede
2016-11-08 16:43 ` Alex Williamson
2016-11-04 21:10 ` [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices Kirti Wankhede
2016-11-07 23:16 ` Alex Williamson
2016-11-08 2:20 ` Jike Song
2016-11-08 16:18 ` Alex Williamson
2016-11-08 15:06 ` Kirti Wankhede [this message]
2016-11-08 17:05 ` Alex Williamson
2016-11-08 6:52 ` Alexey Kardashevskiy
2016-11-15 5:17 ` Alexey Kardashevskiy
2016-11-15 6:33 ` Kirti Wankhede
2016-11-15 7:27 ` Alexey Kardashevskiy
2016-11-15 7:56 ` Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP Kirti Wankhede
2016-11-07 23:45 ` Alex Williamson
2016-11-08 16:26 ` Kirti Wankhede
2016-11-08 17:46 ` Alex Williamson
2016-11-08 19:59 ` Kirti Wankhede
2016-11-08 21:28 ` Alex Williamson
2016-11-14 7:52 ` Kirti Wankhede
2016-11-14 15:37 ` Alex Williamson
2016-11-04 21:10 ` [PATCH v11 12/22] vfio: Add notifier callback to parent's ops structure of mdev Kirti Wankhede
2016-11-07 23:51 ` Alex Williamson
2016-11-04 21:10 ` [PATCH v11 13/22] vfio: Introduce common function to add capabilities Kirti Wankhede
2016-11-08 7:29 ` Alexey Kardashevskiy
2016-11-08 20:46 ` Kirti Wankhede
2016-11-08 21:42 ` Alex Williamson
2016-11-09 2:23 ` Alexey Kardashevskiy
2016-11-04 21:10 ` [PATCH v11 14/22] vfio_pci: Update vfio_pci to use vfio_info_add_capability() Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 15/22] vfio: Introduce vfio_set_irqs_validate_and_prepare() Kirti Wankhede
2016-11-08 8:46 ` Alexey Kardashevskiy
2016-11-08 20:22 ` Kirti Wankhede
2016-11-09 3:07 ` Alexey Kardashevskiy
2016-11-09 3:35 ` Alex Williamson
2016-11-04 21:10 ` [PATCH v11 16/22] vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare() Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 17/22] vfio_platform: " Kirti Wankhede
2016-11-08 8:52 ` Alexey Kardashevskiy
2016-11-08 20:41 ` Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 18/22] vfio: Define device_api strings Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 19/22] docs: Add Documentation for Mediated devices Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 20/22] docs: Sysfs ABI for mediated device framework Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 21/22] docs: Sample driver to demonstrate how to use Mediated " Kirti Wankhede
2016-11-04 21:10 ` [PATCH v11 22/22] MAINTAINERS: Add entry VFIO based Mediated device drivers Kirti Wankhede
2016-11-07 3:30 ` [PATCH v11 00/22] Add Mediated device support Alexey Kardashevskiy
2016-11-07 3:59 ` Kirti Wankhede
2016-11-07 5:06 ` Kirti Wankhede
2016-11-07 6:15 ` Alexey Kardashevskiy
2016-11-07 6:36 ` Kirti Wankhede
2016-11-07 6:46 ` Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f99b668b-e938-9fde-d166-cacfc7b07a53@nvidia.com \
--to=kwankhede@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=bjsdjshi@linux.vnet.ibm.com \
--cc=cjia@nvidia.com \
--cc=jike.song@intel.com \
--cc=kevin.tian@intel.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).