linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirti Wankhede <kwankhede@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: <cjia@nvidia.com>, <linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>
Subject: Re: [PATCH] vfio/type1: Restore mapping performance with mdev support
Date: Thu, 15 Dec 2016 23:27:54 +0530	[thread overview]
Message-ID: <02707161-145f-25f3-ab47-c63d1de81e02@nvidia.com> (raw)
In-Reply-To: <20161215010347.3942360a@t450s.home>



On 12/15/2016 1:33 PM, Alex Williamson wrote:
> On Thu, 15 Dec 2016 12:05:35 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> On 12/14/2016 2:28 AM, Alex Williamson wrote:
>>> As part of the mdev support, type1 now gets a task reference per
>>> vfio_dma and uses that to get an mm reference for the task while
>>> working on accounting.  That's the correct thing to do for paths
>>> where we can't rely on using current, but there are still hot paths
>>> where we can optimize because we know we're invoked by the user.
>>>
>>> Specifically, vfio_pin_pages_remote() is only called when the user
>>> does DMA mapping (vfio_dma_do_map) or if an IOMMU group is added to
>>> a container with existing mappings (vfio_iommu_replay).  We can
>>> therefore use current->mm as well as rlimit() and capable() directly
>>> rather than going through the high overhead path via the stored
>>> task_struct.  We also know that vfio_dma_do_unmap() is only called
>>> via user ioctl, so we can also tune that path to be more lightweight.
>>>
>>> In a synthetic guest mapping test emulating a 1TB VM backed by a
>>> single 4GB range remapped multiple times across the address space,
>>> the mdev changes to the type1 backend introduced a roughly 25% hit
>>> in runtime of this test.  These changes restore it to nearly the
>>> previous performance for the interfaces exercised here,
>>> VFIO_IOMMU_MAP_DMA and release on close.
>>>
>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c |  145 +++++++++++++++++++++------------------
>>>  1 file changed, 79 insertions(+), 66 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 9815e45..8dfeafb 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -103,6 +103,10 @@ struct vfio_pfn {
>>>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
>>>  					(!list_empty(&iommu->domain_list))
>>>  
>>> +/* Make function bool options readable */
>>> +#define IS_CURRENT	(true)
>>> +#define DO_ACCOUNTING	(true)
>>> +
>>>  static int put_pfn(unsigned long pfn, int prot);
>>>  
>>>  /*
>>> @@ -264,7 +268,8 @@ static void vfio_lock_acct_bg(struct work_struct *work)
>>>  	kfree(vwork);
>>>  }
>>>  
>>> -static void vfio_lock_acct(struct task_struct *task, long npage)
>>> +static void vfio_lock_acct(struct task_struct *task,
>>> +			   long npage, bool is_current)
>>>  {
>>>  	struct vwork *vwork;
>>>  	struct mm_struct *mm;
>>> @@ -272,24 +277,31 @@ static void vfio_lock_acct(struct task_struct *task, long npage)
>>>  	if (!npage)
>>>  		return;
>>>  
>>> -	mm = get_task_mm(task);
>>> +	mm = is_current ? task->mm : get_task_mm(task);
>>>  	if (!mm)
>>> -		return; /* process exited or nothing to do */
>>> +		return; /* process exited */
>>>  
>>>  	if (down_write_trylock(&mm->mmap_sem)) {
>>>  		mm->locked_vm += npage;
>>>  		up_write(&mm->mmap_sem);
>>> -		mmput(mm);
>>> +		if (!is_current)
>>> +			mmput(mm);
>>>  		return;
>>>  	}
>>>  
>>> +	if (is_current) {
>>> +		mm = get_task_mm(task);
>>> +		if (!mm)
>>> +			return;
>>> +	}
>>> +
>>>  	/*
>>>  	 * Couldn't get mmap_sem lock, so must setup to update
>>>  	 * mm->locked_vm later. If locked_vm were atomic, we
>>>  	 * wouldn't need this silliness
>>>  	 */
>>>  	vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
>>> -	if (!vwork) {
>>> +	if (WARN_ON(!vwork)) {
>>>  		mmput(mm);
>>>  		return;
>>>  	}
>>> @@ -345,13 +357,13 @@ static int put_pfn(unsigned long pfn, int prot)
>>>  }
>>>  
>>>  static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>>> -			 int prot, unsigned long *pfn)
>>> +			 int prot, unsigned long *pfn, bool is_current)
>>>  {
>>>  	struct page *page[1];
>>>  	struct vm_area_struct *vma;
>>>  	int ret;
>>>  
>>> -	if (mm == current->mm) {
>>> +	if (is_current) {  
>>
>> With this change, if vfio_pin_page_external() gets called from QEMU
>> process context, for example in response to some BAR0 register access,
>> it will still fallback to slow path, get_user_pages_remote(). We don't
>> have to change this function. This path already takes care of taking
>> best possible path.
>>
>> That also makes me think, vfio_pin_page_external() uses task structure
>> to get mlock limit and capability. Expectation is mdev vendor driver
>> shouldn't pin all system memory, but if any mdev driver does that, then
>> that driver might see such performance impact. Should we optimize this
>> path if (dma->task == current)?
> 
> Hi Kirti,
> 
> I was actually trying to avoid the (task == current) test with this
> change because I wasn't sure how reliable it is.  Is there a
> possibility that this test generates a false positive if current
> coincidentally matches our task and does that allow us the same
> opportunities for making use of current that we have when we know in a
> process context execution path?  The above change makes this a more
> direct association.  Can you show that inferring the process context is
> correct?  Thanks,

We do hold the usage count of task structure, get_task_struct(current),
before saving its reference in dma->task which is released,
put_task_struct(), from vfio_remove_dma(). That makes sure that we have
a valid reference to task structure till we remove/free that dma
structure. Why would the check (dma->task == current) be false positive?
Vendor driver can call vfio_pin_pages() on access to some emulated
register from the same task who have mapped dma range, in that case this
check would be true.

Thanks,
Kirti

  reply	other threads:[~2016-12-15 17:58 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-13 20:58 [PATCH] vfio/type1: Restore mapping performance with mdev support Alex Williamson
2016-12-15  6:35 ` Kirti Wankhede
2016-12-15  8:03   ` Alex Williamson
2016-12-15 17:57     ` Kirti Wankhede [this message]
2016-12-15 18:27       ` Alex Williamson
2016-12-16 18:24         ` Kirti Wankhede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02707161-145f-25f3-ab47-c63d1de81e02@nvidia.com \
    --to=kwankhede@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).