From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753075AbcKHORJ (ORCPT <rfc822;w@1wt.eu>);
        Tue, 8 Nov 2016 09:17:09 -0500
Received: from hqemgate15.nvidia.com ([216.228.121.64]:3808 "EHLO
        hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751423AbcKHORG (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 8 Nov 2016 09:17:06 -0500
X-PGP-Universal: processed;
        by hqpgpgate101.nvidia.com on Tue, 08 Nov 2016 06:17:04 -0800
Subject: Re: [PATCH v11 09/22] vfio iommu type1: Add task structure to
 vfio_dma
To: Alex Williamson <alex.williamson@redhat.com>
References: <1478293856-8191-1-git-send-email-kwankhede@nvidia.com>
 <1478293856-8191-10-git-send-email-kwankhede@nvidia.com>
 <20161107140348.55176252@t450s.home>
CC: <pbonzini@redhat.com>, <kraxel@redhat.com>, <cjia@nvidia.com>,
        <qemu-devel@nongnu.org>, <kvm@vger.kernel.org>, <kevin.tian@intel.com>,
        <jike.song@intel.com>, <bjsdjshi@linux.vnet.ibm.com>,
        <linux-kernel@vger.kernel.org>
X-Nvconfidentiality: public
From: Kirti Wankhede <kwankhede@nvidia.com>
Message-ID: <71e24995-1678-7e43-90fa-7798cfcdebbc@nvidia.com>
Date: Tue, 8 Nov 2016 19:43:25 +0530
MIME-Version: 1.0
In-Reply-To: <20161107140348.55176252@t450s.home>
X-Originating-IP: [10.24.216.210]
X-ClientProxiedBy: DRHKMAIL101.nvidia.com (10.25.59.15) To
 bgmail102.nvidia.com (10.25.59.11)
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 11/8/2016 2:33 AM, Alex Williamson wrote:
> On Sat, 5 Nov 2016 02:40:43 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 

...

>>  static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  			   struct vfio_iommu_type1_dma_map *map)
>>  {
>>  	dma_addr_t iova = map->iova;
>>  	unsigned long vaddr = map->vaddr;
>>  	size_t size = map->size;
>> -	long npage;
>>  	int ret = 0, prot = 0;
>>  	uint64_t mask;
>>  	struct vfio_dma *dma;
>> -	unsigned long pfn;
>> +	struct vfio_addr_space *addr_space;
>> +	struct mm_struct *mm;
>> +	bool free_addr_space_on_err = false;
>>  
>>  	/* Verify that none of our __u64 fields overflow */
>>  	if (map->size != size || map->vaddr != vaddr || map->iova != iova)
>> @@ -608,47 +685,56 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  	mutex_lock(&iommu->lock);
>>  
>>  	if (vfio_find_dma(iommu, iova, size)) {
>> -		mutex_unlock(&iommu->lock);
>> -		return -EEXIST;
>> +		ret = -EEXIST;
>> +		goto do_map_err;
>> +	}
>> +
>> +	mm = get_task_mm(current);
>> +	if (!mm) {
>> +		ret = -ENODEV;
> 
> -EFAULT?
>

-ENODEV return is in original code from vfio_pin_pages()
        if (!current->mm)
                return -ENODEV;

Once I thought of changing it to -EFAULT, but then again changed to
-ENODEV to be consistent with original error code.

Should I still change this return to -EFAULT?


>> +		goto do_map_err;
>> +	}
>> +
>> +	addr_space = vfio_find_addr_space(iommu, mm);
>> +	if (addr_space) {
>> +		atomic_inc(&addr_space->ref_count);
>> +		mmput(mm);
>> +	} else {
>> +		addr_space = kzalloc(sizeof(*addr_space), GFP_KERNEL);
>> +		if (!addr_space) {
>> +			ret = -ENOMEM;
>> +			goto do_map_err;
>> +		}
>> +		addr_space->mm = mm;
>> +		atomic_set(&addr_space->ref_count, 1);
>> +		list_add(&addr_space->next, &iommu->addr_space_list);
>> +		free_addr_space_on_err = true;
>>  	}
>>  
>>  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>>  	if (!dma) {
>> -		mutex_unlock(&iommu->lock);
>> -		return -ENOMEM;
>> +		if (free_addr_space_on_err) {
>> +			mmput(mm);
>> +			list_del(&addr_space->next);
>> +			kfree(addr_space);
>> +		}
>> +		ret = -ENOMEM;
>> +		goto do_map_err;
>>  	}
>>  
>>  	dma->iova = iova;
>>  	dma->vaddr = vaddr;
>>  	dma->prot = prot;
>> +	dma->addr_space = addr_space;
>> +	get_task_struct(current);
>> +	dma->task = current;
>> +	dma->mlock_cap = capable(CAP_IPC_LOCK);
> 
> 
> How do you reason we can cache this?  Does the fact that the process
> had this capability at the time that it did a DMA_MAP imply that it
> necessarily still has this capability when an external user (vendor
> driver) tries to pin pages?  I don't see how we can make that
> assumption.
> 
> 

Will process change MEMLOCK limit at runtime? I think it shouldn't,
correct me if I'm wrong. QEMU doesn't do that, right?

The function capable() determines current task's capability. But when
vfio_pin_pages() is called, it could come from other task but pages are
pinned from address space of task who mapped it. So we can't use
capable() in vfio_pin_pages()

If this capability shouldn't be cached, we have to use has_capability()
with dma->task as argument in vfio_pin_pages()

 bool has_capability(struct task_struct *t, int cap)

Thanks,
Kirti