linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] vhost-vdpa: fix page pinning leakage in error path (rework)
@ 2020-11-04 23:33 Si-Wei Liu
  2020-11-05  3:26 ` Jason Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Si-Wei Liu @ 2020-11-04 23:33 UTC (permalink / raw)
  To: mst, jasowang, lingshan.zhu
  Cc: joao.m.martins, boris.ostrovsky, linux-kernel, virtualization,
	si-wei.liu

Pinned pages are not properly accounted particularly when
mapping error occurs on IOTLB update. Clean up dangling
pinned pages for the error path.

The memory usage for bookkeeping pinned pages is reverted
to what it was before: only one single free page is needed.
This helps reduce the host memory demand for VM with a large
amount of memory, or in the situation where host is running
short of free memory.

Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
Changes in v2:
- Drop the reversion patch
- Fix unhandled page leak towards the end of page_list

 drivers/vhost/vdpa.c | 79 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 61 insertions(+), 18 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index b6d9016..e112854 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -560,6 +560,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
 
 	if (r)
 		vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
+	else
+		atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm);
 
 	return r;
 }
@@ -591,14 +593,16 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
 	unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
 	unsigned int gup_flags = FOLL_LONGTERM;
 	unsigned long npages, cur_base, map_pfn, last_pfn = 0;
-	unsigned long locked, lock_limit, pinned, i;
+	unsigned long lock_limit, sz2pin, nchunks, i;
 	u64 iova = msg->iova;
+	long pinned;
 	int ret = 0;
 
 	if (vhost_iotlb_itree_first(iotlb, msg->iova,
 				    msg->iova + msg->size - 1))
 		return -EEXIST;
 
+	/* Limit the use of memory for bookkeeping */
 	page_list = (struct page **) __get_free_page(GFP_KERNEL);
 	if (!page_list)
 		return -ENOMEM;
@@ -607,52 +611,75 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
 		gup_flags |= FOLL_WRITE;
 
 	npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;
-	if (!npages)
-		return -EINVAL;
+	if (!npages) {
+		ret = -EINVAL;
+		goto free;
+	}
 
 	mmap_read_lock(dev->mm);
 
-	locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-	if (locked > lock_limit) {
+	if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
 		ret = -ENOMEM;
-		goto out;
+		goto unlock;
 	}
 
 	cur_base = msg->uaddr & PAGE_MASK;
 	iova &= PAGE_MASK;
+	nchunks = 0;
 
 	while (npages) {
-		pinned = min_t(unsigned long, npages, list_size);
-		ret = pin_user_pages(cur_base, pinned,
-				     gup_flags, page_list, NULL);
-		if (ret != pinned)
+		sz2pin = min_t(unsigned long, npages, list_size);
+		pinned = pin_user_pages(cur_base, sz2pin,
+					gup_flags, page_list, NULL);
+		if (sz2pin != pinned) {
+			if (pinned < 0) {
+				ret = pinned;
+			} else {
+				unpin_user_pages(page_list, pinned);
+				ret = -ENOMEM;
+			}
 			goto out;
+		}
+		nchunks++;
 
 		if (!last_pfn)
 			map_pfn = page_to_pfn(page_list[0]);
 
-		for (i = 0; i < ret; i++) {
+		for (i = 0; i < pinned; i++) {
 			unsigned long this_pfn = page_to_pfn(page_list[i]);
 			u64 csize;
 
 			if (last_pfn && (this_pfn != last_pfn + 1)) {
 				/* Pin a contiguous chunk of memory */
 				csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
-				if (vhost_vdpa_map(v, iova, csize,
-						   map_pfn << PAGE_SHIFT,
-						   msg->perm))
+				ret = vhost_vdpa_map(v, iova, csize,
+						     map_pfn << PAGE_SHIFT,
+						     msg->perm);
+				if (ret) {
+					/*
+					 * Unpin the pages that are left unmapped
+					 * from this point on in the current
+					 * page_list. The remaining outstanding
+					 * ones which may stride across several
+					 * chunks will be covered in the common
+					 * error path subsequently.
+					 */
+					unpin_user_pages(&page_list[i],
+							 pinned - i);
 					goto out;
+				}
+
 				map_pfn = this_pfn;
 				iova += csize;
+				nchunks = 0;
 			}
 
 			last_pfn = this_pfn;
 		}
 
-		cur_base += ret << PAGE_SHIFT;
-		npages -= ret;
+		cur_base += pinned << PAGE_SHIFT;
+		npages -= pinned;
 	}
 
 	/* Pin the rest chunk */
@@ -660,10 +687,26 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
 			     map_pfn << PAGE_SHIFT, msg->perm);
 out:
 	if (ret) {
+		if (nchunks && last_pfn) {
+			unsigned long pfn;
+
+			/*
+			 * Unpin the outstanding pages which are yet to be
+			 * mapped but haven't due to vdpa_map() or
+			 * pin_user_pages() failure.
+			 *
+			 * Mapped pages are accounted in vdpa_map(), hence
+			 * the corresponding unpinning will be handled by
+			 * vdpa_unmap().
+			 */
+			for (pfn = map_pfn; pfn <= last_pfn; pfn++)
+				unpin_user_page(pfn_to_page(pfn));
+		}
 		vhost_vdpa_unmap(v, msg->iova, msg->size);
-		atomic64_sub(npages, &dev->mm->pinned_vm);
 	}
+unlock:
 	mmap_read_unlock(dev->mm);
+free:
 	free_page((unsigned long)page_list);
 	return ret;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] vhost-vdpa: fix page pinning leakage in error path (rework)
  2020-11-04 23:33 [PATCH v2] vhost-vdpa: fix page pinning leakage in error path (rework) Si-Wei Liu
@ 2020-11-05  3:26 ` Jason Wang
  2020-11-05 22:57   ` si-wei liu
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Wang @ 2020-11-05  3:26 UTC (permalink / raw)
  To: Si-Wei Liu, mst, lingshan.zhu
  Cc: joao.m.martins, boris.ostrovsky, linux-kernel, virtualization


On 2020/11/5 上午7:33, Si-Wei Liu wrote:
> Pinned pages are not properly accounted particularly when
> mapping error occurs on IOTLB update. Clean up dangling
> pinned pages for the error path.
>
> The memory usage for bookkeeping pinned pages is reverted
> to what it was before: only one single free page is needed.
> This helps reduce the host memory demand for VM with a large
> amount of memory, or in the situation where host is running
> short of free memory.
>
> Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
> Changes in v2:
> - Drop the reversion patch
> - Fix unhandled page leak towards the end of page_list
>
>   drivers/vhost/vdpa.c | 79 ++++++++++++++++++++++++++++++++++++++++------------
>   1 file changed, 61 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index b6d9016..e112854 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -560,6 +560,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
>   
>   	if (r)
>   		vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
> +	else
> +		atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm);
>   
>   	return r;
>   }
> @@ -591,14 +593,16 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>   	unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
>   	unsigned int gup_flags = FOLL_LONGTERM;
>   	unsigned long npages, cur_base, map_pfn, last_pfn = 0;
> -	unsigned long locked, lock_limit, pinned, i;
> +	unsigned long lock_limit, sz2pin, nchunks, i;
>   	u64 iova = msg->iova;
> +	long pinned;
>   	int ret = 0;
>   
>   	if (vhost_iotlb_itree_first(iotlb, msg->iova,
>   				    msg->iova + msg->size - 1))
>   		return -EEXIST;
>   
> +	/* Limit the use of memory for bookkeeping */
>   	page_list = (struct page **) __get_free_page(GFP_KERNEL);
>   	if (!page_list)
>   		return -ENOMEM;
> @@ -607,52 +611,75 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>   		gup_flags |= FOLL_WRITE;
>   
>   	npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;
> -	if (!npages)
> -		return -EINVAL;
> +	if (!npages) {
> +		ret = -EINVAL;
> +		goto free;
> +	}
>   
>   	mmap_read_lock(dev->mm);
>   
> -	locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
>   	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -
> -	if (locked > lock_limit) {
> +	if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
>   		ret = -ENOMEM;
> -		goto out;
> +		goto unlock;
>   	}
>   
>   	cur_base = msg->uaddr & PAGE_MASK;
>   	iova &= PAGE_MASK;
> +	nchunks = 0;
>   
>   	while (npages) {
> -		pinned = min_t(unsigned long, npages, list_size);
> -		ret = pin_user_pages(cur_base, pinned,
> -				     gup_flags, page_list, NULL);
> -		if (ret != pinned)
> +		sz2pin = min_t(unsigned long, npages, list_size);
> +		pinned = pin_user_pages(cur_base, sz2pin,
> +					gup_flags, page_list, NULL);
> +		if (sz2pin != pinned) {
> +			if (pinned < 0) {
> +				ret = pinned;
> +			} else {
> +				unpin_user_pages(page_list, pinned);
> +				ret = -ENOMEM;
> +			}
>   			goto out;
> +		}
> +		nchunks++;
>   
>   		if (!last_pfn)
>   			map_pfn = page_to_pfn(page_list[0]);
>   
> -		for (i = 0; i < ret; i++) {
> +		for (i = 0; i < pinned; i++) {
>   			unsigned long this_pfn = page_to_pfn(page_list[i]);
>   			u64 csize;
>   
>   			if (last_pfn && (this_pfn != last_pfn + 1)) {
>   				/* Pin a contiguous chunk of memory */
>   				csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
> -				if (vhost_vdpa_map(v, iova, csize,
> -						   map_pfn << PAGE_SHIFT,
> -						   msg->perm))
> +				ret = vhost_vdpa_map(v, iova, csize,
> +						     map_pfn << PAGE_SHIFT,
> +						     msg->perm);
> +				if (ret) {
> +					/*
> +					 * Unpin the pages that are left unmapped
> +					 * from this point on in the current
> +					 * page_list. The remaining outstanding
> +					 * ones which may stride across several
> +					 * chunks will be covered in the common
> +					 * error path subsequently.
> +					 */
> +					unpin_user_pages(&page_list[i],
> +							 pinned - i);


Can we simply do last_pfn = this_pfn here?


>   					goto out;
> +				}
> +
>   				map_pfn = this_pfn;
>   				iova += csize;
> +				nchunks = 0;
>   			}
>   
>   			last_pfn = this_pfn;
>   		}
>   
> -		cur_base += ret << PAGE_SHIFT;
> -		npages -= ret;
> +		cur_base += pinned << PAGE_SHIFT;
> +		npages -= pinned;
>   	}
>   
>   	/* Pin the rest chunk */
> @@ -660,10 +687,26 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>   			     map_pfn << PAGE_SHIFT, msg->perm);
>   out:
>   	if (ret) {
> +		if (nchunks && last_pfn) {


Any reason for checking last_pfn here?

Note that we did:

+		nchunks++;
  
  		if (!last_pfn)
  			map_pfn = page_to_pfn(page_list[0]);


Thanks


> +			unsigned long pfn;
> +
> +			/*
> +			 * Unpin the outstanding pages which are yet to be
> +			 * mapped but haven't due to vdpa_map() or
> +			 * pin_user_pages() failure.
> +			 *
> +			 * Mapped pages are accounted in vdpa_map(), hence
> +			 * the corresponding unpinning will be handled by
> +			 * vdpa_unmap().
> +			 */
> +			for (pfn = map_pfn; pfn <= last_pfn; pfn++)
> +				unpin_user_page(pfn_to_page(pfn));
> +		}
>   		vhost_vdpa_unmap(v, msg->iova, msg->size);
> -		atomic64_sub(npages, &dev->mm->pinned_vm);
>   	}
> +unlock:
>   	mmap_read_unlock(dev->mm);
> +free:
>   	free_page((unsigned long)page_list);
>   	return ret;
>   }


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] vhost-vdpa: fix page pinning leakage in error path (rework)
  2020-11-05  3:26 ` Jason Wang
@ 2020-11-05 22:57   ` si-wei liu
  2020-11-09  3:21     ` Jason Wang
  0 siblings, 1 reply; 8+ messages in thread
From: si-wei liu @ 2020-11-05 22:57 UTC (permalink / raw)
  To: Jason Wang, mst, lingshan.zhu
  Cc: joao.m.martins, boris.ostrovsky, linux-kernel, virtualization


On 11/4/2020 7:26 PM, Jason Wang wrote:
>
> On 2020/11/5 上午7:33, Si-Wei Liu wrote:
>> Pinned pages are not properly accounted particularly when
>> mapping error occurs on IOTLB update. Clean up dangling
>> pinned pages for the error path.
>>
>> The memory usage for bookkeeping pinned pages is reverted
>> to what it was before: only one single free page is needed.
>> This helps reduce the host memory demand for VM with a large
>> amount of memory, or in the situation where host is running
>> short of free memory.
>>
>> Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>> Changes in v2:
>> - Drop the reversion patch
>> - Fix unhandled page leak towards the end of page_list
>>
>>   drivers/vhost/vdpa.c | 79 
>> ++++++++++++++++++++++++++++++++++++++++------------
>>   1 file changed, 61 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index b6d9016..e112854 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -560,6 +560,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
>>         if (r)
>>           vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
>> +    else
>> +        atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm);
>>         return r;
>>   }
>> @@ -591,14 +593,16 @@ static int 
>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>       unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
>>       unsigned int gup_flags = FOLL_LONGTERM;
>>       unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>> -    unsigned long locked, lock_limit, pinned, i;
>> +    unsigned long lock_limit, sz2pin, nchunks, i;
>>       u64 iova = msg->iova;
>> +    long pinned;
>>       int ret = 0;
>>         if (vhost_iotlb_itree_first(iotlb, msg->iova,
>>                       msg->iova + msg->size - 1))
>>           return -EEXIST;
>>   +    /* Limit the use of memory for bookkeeping */
>>       page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>       if (!page_list)
>>           return -ENOMEM;
>> @@ -607,52 +611,75 @@ static int 
>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>           gup_flags |= FOLL_WRITE;
>>         npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> 
>> PAGE_SHIFT;
>> -    if (!npages)
>> -        return -EINVAL;
>> +    if (!npages) {
>> +        ret = -EINVAL;
>> +        goto free;
>> +    }
>>         mmap_read_lock(dev->mm);
>>   -    locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
>>       lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>> -
>> -    if (locked > lock_limit) {
>> +    if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
>>           ret = -ENOMEM;
>> -        goto out;
>> +        goto unlock;
>>       }
>>         cur_base = msg->uaddr & PAGE_MASK;
>>       iova &= PAGE_MASK;
>> +    nchunks = 0;
>>         while (npages) {
>> -        pinned = min_t(unsigned long, npages, list_size);
>> -        ret = pin_user_pages(cur_base, pinned,
>> -                     gup_flags, page_list, NULL);
>> -        if (ret != pinned)
>> +        sz2pin = min_t(unsigned long, npages, list_size);
>> +        pinned = pin_user_pages(cur_base, sz2pin,
>> +                    gup_flags, page_list, NULL);
>> +        if (sz2pin != pinned) {
>> +            if (pinned < 0) {
>> +                ret = pinned;
>> +            } else {
>> +                unpin_user_pages(page_list, pinned);
>> +                ret = -ENOMEM;
>> +            }
>>               goto out;
>> +        }
>> +        nchunks++;
>>             if (!last_pfn)
>>               map_pfn = page_to_pfn(page_list[0]);
>>   -        for (i = 0; i < ret; i++) {
>> +        for (i = 0; i < pinned; i++) {
>>               unsigned long this_pfn = page_to_pfn(page_list[i]);
>>               u64 csize;
>>                 if (last_pfn && (this_pfn != last_pfn + 1)) {
>>                   /* Pin a contiguous chunk of memory */
>>                   csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
>> -                if (vhost_vdpa_map(v, iova, csize,
>> -                           map_pfn << PAGE_SHIFT,
>> -                           msg->perm))
>> +                ret = vhost_vdpa_map(v, iova, csize,
>> +                             map_pfn << PAGE_SHIFT,
>> +                             msg->perm);
>> +                if (ret) {
>> +                    /*
>> +                     * Unpin the pages that are left unmapped
>> +                     * from this point on in the current
>> +                     * page_list. The remaining outstanding
>> +                     * ones which may stride across several
>> +                     * chunks will be covered in the common
>> +                     * error path subsequently.
>> +                     */
>> +                    unpin_user_pages(&page_list[i],
>> +                             pinned - i);
>
>
> Can we simply do last_pfn = this_pfn here?
Nope. They are not contiguous segments of memory. Noted the conditional 
(this_pfn != last_pfn + 1) being held here.

>
>
>>                       goto out;
>> +                }
>> +
>>                   map_pfn = this_pfn;
>>                   iova += csize;
>> +                nchunks = 0;
>>               }
>>                 last_pfn = this_pfn;
>>           }
>>   -        cur_base += ret << PAGE_SHIFT;
>> -        npages -= ret;
>> +        cur_base += pinned << PAGE_SHIFT;
>> +        npages -= pinned;
>>       }
>>         /* Pin the rest chunk */
>> @@ -660,10 +687,26 @@ static int 
>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>                    map_pfn << PAGE_SHIFT, msg->perm);
>>   out:
>>       if (ret) {
>> +        if (nchunks && last_pfn) {
>
>
> Any reason for checking last_pfn here?
>
> Note that we did:
>
> +        nchunks++;
>
>          if (!last_pfn)
>              map_pfn = page_to_pfn(page_list[0]);
It's for explicit coding to make sure this common error path can be 
reused no matter if last_pfn has a sane value assigned or not. I can 
change it to an implicit WARN_ON() if need be.

Thanks,
-Siwei

>
>
> Thanks
>
>
>> +            unsigned long pfn;
>> +
>> +            /*
>> +             * Unpin the outstanding pages which are yet to be
>> +             * mapped but haven't due to vdpa_map() or
>> +             * pin_user_pages() failure.
>> +             *
>> +             * Mapped pages are accounted in vdpa_map(), hence
>> +             * the corresponding unpinning will be handled by
>> +             * vdpa_unmap().
>> +             */
>> +            for (pfn = map_pfn; pfn <= last_pfn; pfn++)
>> +                unpin_user_page(pfn_to_page(pfn));
>> +        }
>>           vhost_vdpa_unmap(v, msg->iova, msg->size);
>> -        atomic64_sub(npages, &dev->mm->pinned_vm);
>>       }
>> +unlock:
>>       mmap_read_unlock(dev->mm);
>> +free:
>>       free_page((unsigned long)page_list);
>>       return ret;
>>   }
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] vhost-vdpa: fix page pinning leakage in error path (rework)
  2020-11-05 22:57   ` si-wei liu
@ 2020-11-09  3:21     ` Jason Wang
  2020-11-09 21:44       ` si-wei liu
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Wang @ 2020-11-09  3:21 UTC (permalink / raw)
  To: si-wei liu, mst, lingshan.zhu
  Cc: joao.m.martins, boris.ostrovsky, linux-kernel, virtualization


On 2020/11/6 上午6:57, si-wei liu wrote:
>
> On 11/4/2020 7:26 PM, Jason Wang wrote:
>>
>> On 2020/11/5 上午7:33, Si-Wei Liu wrote:
>>> Pinned pages are not properly accounted particularly when
>>> mapping error occurs on IOTLB update. Clean up dangling
>>> pinned pages for the error path.
>>>
>>> The memory usage for bookkeeping pinned pages is reverted
>>> to what it was before: only one single free page is needed.
>>> This helps reduce the host memory demand for VM with a large
>>> amount of memory, or in the situation where host is running
>>> short of free memory.
>>>
>>> Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>> ---
>>> Changes in v2:
>>> - Drop the reversion patch
>>> - Fix unhandled page leak towards the end of page_list
>>>
>>>   drivers/vhost/vdpa.c | 79 
>>> ++++++++++++++++++++++++++++++++++++++++------------
>>>   1 file changed, 61 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>>> index b6d9016..e112854 100644
>>> --- a/drivers/vhost/vdpa.c
>>> +++ b/drivers/vhost/vdpa.c
>>> @@ -560,6 +560,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
>>>         if (r)
>>>           vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
>>> +    else
>>> +        atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm);
>>>         return r;
>>>   }
>>> @@ -591,14 +593,16 @@ static int 
>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>       unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
>>>       unsigned int gup_flags = FOLL_LONGTERM;
>>>       unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>>> -    unsigned long locked, lock_limit, pinned, i;
>>> +    unsigned long lock_limit, sz2pin, nchunks, i;
>>>       u64 iova = msg->iova;
>>> +    long pinned;
>>>       int ret = 0;
>>>         if (vhost_iotlb_itree_first(iotlb, msg->iova,
>>>                       msg->iova + msg->size - 1))
>>>           return -EEXIST;
>>>   +    /* Limit the use of memory for bookkeeping */
>>>       page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>>       if (!page_list)
>>>           return -ENOMEM;
>>> @@ -607,52 +611,75 @@ static int 
>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>           gup_flags |= FOLL_WRITE;
>>>         npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> 
>>> PAGE_SHIFT;
>>> -    if (!npages)
>>> -        return -EINVAL;
>>> +    if (!npages) {
>>> +        ret = -EINVAL;
>>> +        goto free;
>>> +    }
>>>         mmap_read_lock(dev->mm);
>>>   -    locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
>>>       lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>>> -
>>> -    if (locked > lock_limit) {
>>> +    if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
>>>           ret = -ENOMEM;
>>> -        goto out;
>>> +        goto unlock;
>>>       }
>>>         cur_base = msg->uaddr & PAGE_MASK;
>>>       iova &= PAGE_MASK;
>>> +    nchunks = 0;
>>>         while (npages) {
>>> -        pinned = min_t(unsigned long, npages, list_size);
>>> -        ret = pin_user_pages(cur_base, pinned,
>>> -                     gup_flags, page_list, NULL);
>>> -        if (ret != pinned)
>>> +        sz2pin = min_t(unsigned long, npages, list_size);
>>> +        pinned = pin_user_pages(cur_base, sz2pin,
>>> +                    gup_flags, page_list, NULL);
>>> +        if (sz2pin != pinned) {
>>> +            if (pinned < 0) {
>>> +                ret = pinned;
>>> +            } else {
>>> +                unpin_user_pages(page_list, pinned);
>>> +                ret = -ENOMEM;
>>> +            }
>>>               goto out;
>>> +        }
>>> +        nchunks++;
>>>             if (!last_pfn)
>>>               map_pfn = page_to_pfn(page_list[0]);
>>>   -        for (i = 0; i < ret; i++) {
>>> +        for (i = 0; i < pinned; i++) {
>>>               unsigned long this_pfn = page_to_pfn(page_list[i]);
>>>               u64 csize;
>>>                 if (last_pfn && (this_pfn != last_pfn + 1)) {
>>>                   /* Pin a contiguous chunk of memory */
>>>                   csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
>>> -                if (vhost_vdpa_map(v, iova, csize,
>>> -                           map_pfn << PAGE_SHIFT,
>>> -                           msg->perm))
>>> +                ret = vhost_vdpa_map(v, iova, csize,
>>> +                             map_pfn << PAGE_SHIFT,
>>> +                             msg->perm);
>>> +                if (ret) {
>>> +                    /*
>>> +                     * Unpin the pages that are left unmapped
>>> +                     * from this point on in the current
>>> +                     * page_list. The remaining outstanding
>>> +                     * ones which may stride across several
>>> +                     * chunks will be covered in the common
>>> +                     * error path subsequently.
>>> +                     */
>>> +                    unpin_user_pages(&page_list[i],
>>> +                             pinned - i);
>>
>>
>> Can we simply do last_pfn = this_pfn here?
> Nope. They are not contiguous segments of memory. Noted the 
> conditional (this_pfn != last_pfn + 1) being held here.


Right.


>
>>
>>
>>>                       goto out;
>>> +                }
>>> +
>>>                   map_pfn = this_pfn;
>>>                   iova += csize;
>>> +                nchunks = 0;
>>>               }
>>>                 last_pfn = this_pfn;
>>>           }
>>>   -        cur_base += ret << PAGE_SHIFT;
>>> -        npages -= ret;
>>> +        cur_base += pinned << PAGE_SHIFT;
>>> +        npages -= pinned;
>>>       }
>>>         /* Pin the rest chunk */
>>> @@ -660,10 +687,26 @@ static int 
>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>                    map_pfn << PAGE_SHIFT, msg->perm);
>>>   out:
>>>       if (ret) {
>>> +        if (nchunks && last_pfn) {
>>
>>
>> Any reason for checking last_pfn here?
>>
>> Note that we did:
>>
>> +        nchunks++;
>>
>>          if (!last_pfn)
>>              map_pfn = page_to_pfn(page_list[0]);
> It's for explicit coding to make sure this common error path can be 
> reused no matter if last_pfn has a sane value assigned or not. I can 
> change it to an implicit WARN_ON() if need be.


Just to make sure I understand. A question, when will we get nchunks != 
0 but last_pfn == 0?

Thanks


> Thanks,
> -Siwei
>
>>
>>
>> Thanks
>>
>>
>>> +            unsigned long pfn;
>>> +
>>> +            /*
>>> +             * Unpin the outstanding pages which are yet to be
>>> +             * mapped but haven't due to vdpa_map() or
>>> +             * pin_user_pages() failure.
>>> +             *
>>> +             * Mapped pages are accounted in vdpa_map(), hence
>>> +             * the corresponding unpinning will be handled by
>>> +             * vdpa_unmap().
>>> +             */
>>> +            for (pfn = map_pfn; pfn <= last_pfn; pfn++)
>>> +                unpin_user_page(pfn_to_page(pfn));
>>> +        }
>>>           vhost_vdpa_unmap(v, msg->iova, msg->size);
>>> -        atomic64_sub(npages, &dev->mm->pinned_vm);
>>>       }
>>> +unlock:
>>>       mmap_read_unlock(dev->mm);
>>> +free:
>>>       free_page((unsigned long)page_list);
>>>       return ret;
>>>   }
>>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] vhost-vdpa: fix page pinning leakage in error path (rework)
  2020-11-09  3:21     ` Jason Wang
@ 2020-11-09 21:44       ` si-wei liu
  2020-11-09 22:42         ` Michael S. Tsirkin
  0 siblings, 1 reply; 8+ messages in thread
From: si-wei liu @ 2020-11-09 21:44 UTC (permalink / raw)
  To: Jason Wang, mst, lingshan.zhu
  Cc: joao.m.martins, boris.ostrovsky, linux-kernel, virtualization


On 11/8/2020 7:21 PM, Jason Wang wrote:
>
> On 2020/11/6 上午6:57, si-wei liu wrote:
>>
>> On 11/4/2020 7:26 PM, Jason Wang wrote:
>>>
>>> On 2020/11/5 上午7:33, Si-Wei Liu wrote:
>>>> Pinned pages are not properly accounted particularly when
>>>> mapping error occurs on IOTLB update. Clean up dangling
>>>> pinned pages for the error path.
>>>>
>>>> The memory usage for bookkeeping pinned pages is reverted
>>>> to what it was before: only one single free page is needed.
>>>> This helps reduce the host memory demand for VM with a large
>>>> amount of memory, or in the situation where host is running
>>>> short of free memory.
>>>>
>>>> Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> ---
>>>> Changes in v2:
>>>> - Drop the reversion patch
>>>> - Fix unhandled page leak towards the end of page_list
>>>>
>>>>   drivers/vhost/vdpa.c | 79 
>>>> ++++++++++++++++++++++++++++++++++++++++------------
>>>>   1 file changed, 61 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>>>> index b6d9016..e112854 100644
>>>> --- a/drivers/vhost/vdpa.c
>>>> +++ b/drivers/vhost/vdpa.c
>>>> @@ -560,6 +560,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
>>>>         if (r)
>>>>           vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
>>>> +    else
>>>> +        atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm);
>>>>         return r;
>>>>   }
>>>> @@ -591,14 +593,16 @@ static int 
>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>       unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
>>>>       unsigned int gup_flags = FOLL_LONGTERM;
>>>>       unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>>>> -    unsigned long locked, lock_limit, pinned, i;
>>>> +    unsigned long lock_limit, sz2pin, nchunks, i;
>>>>       u64 iova = msg->iova;
>>>> +    long pinned;
>>>>       int ret = 0;
>>>>         if (vhost_iotlb_itree_first(iotlb, msg->iova,
>>>>                       msg->iova + msg->size - 1))
>>>>           return -EEXIST;
>>>>   +    /* Limit the use of memory for bookkeeping */
>>>>       page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>>>       if (!page_list)
>>>>           return -ENOMEM;
>>>> @@ -607,52 +611,75 @@ static int 
>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>           gup_flags |= FOLL_WRITE;
>>>>         npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> 
>>>> PAGE_SHIFT;
>>>> -    if (!npages)
>>>> -        return -EINVAL;
>>>> +    if (!npages) {
>>>> +        ret = -EINVAL;
>>>> +        goto free;
>>>> +    }
>>>>         mmap_read_lock(dev->mm);
>>>>   -    locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
>>>>       lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>>>> -
>>>> -    if (locked > lock_limit) {
>>>> +    if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
>>>>           ret = -ENOMEM;
>>>> -        goto out;
>>>> +        goto unlock;
>>>>       }
>>>>         cur_base = msg->uaddr & PAGE_MASK;
>>>>       iova &= PAGE_MASK;
>>>> +    nchunks = 0;
>>>>         while (npages) {
>>>> -        pinned = min_t(unsigned long, npages, list_size);
>>>> -        ret = pin_user_pages(cur_base, pinned,
>>>> -                     gup_flags, page_list, NULL);
>>>> -        if (ret != pinned)
>>>> +        sz2pin = min_t(unsigned long, npages, list_size);
>>>> +        pinned = pin_user_pages(cur_base, sz2pin,
>>>> +                    gup_flags, page_list, NULL);
>>>> +        if (sz2pin != pinned) {
>>>> +            if (pinned < 0) {
>>>> +                ret = pinned;
>>>> +            } else {
>>>> +                unpin_user_pages(page_list, pinned);
>>>> +                ret = -ENOMEM;
>>>> +            }
>>>>               goto out;
>>>> +        }
>>>> +        nchunks++;
>>>>             if (!last_pfn)
>>>>               map_pfn = page_to_pfn(page_list[0]);
>>>>   -        for (i = 0; i < ret; i++) {
>>>> +        for (i = 0; i < pinned; i++) {
>>>>               unsigned long this_pfn = page_to_pfn(page_list[i]);
>>>>               u64 csize;
>>>>                 if (last_pfn && (this_pfn != last_pfn + 1)) {
>>>>                   /* Pin a contiguous chunk of memory */
>>>>                   csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
>>>> -                if (vhost_vdpa_map(v, iova, csize,
>>>> -                           map_pfn << PAGE_SHIFT,
>>>> -                           msg->perm))
>>>> +                ret = vhost_vdpa_map(v, iova, csize,
>>>> +                             map_pfn << PAGE_SHIFT,
>>>> +                             msg->perm);
>>>> +                if (ret) {
>>>> +                    /*
>>>> +                     * Unpin the pages that are left unmapped
>>>> +                     * from this point on in the current
>>>> +                     * page_list. The remaining outstanding
>>>> +                     * ones which may stride across several
>>>> +                     * chunks will be covered in the common
>>>> +                     * error path subsequently.
>>>> +                     */
>>>> +                    unpin_user_pages(&page_list[i],
>>>> +                             pinned - i);
>>>
>>>
>>> Can we simply do last_pfn = this_pfn here?
>> Nope. They are not contiguous segments of memory. Noted the 
>> conditional (this_pfn != last_pfn + 1) being held here.
>
>
> Right.
>
>
>>
>>>
>>>
>>>>                       goto out;
>>>> +                }
>>>> +
>>>>                   map_pfn = this_pfn;
>>>>                   iova += csize;
>>>> +                nchunks = 0;
>>>>               }
>>>>                 last_pfn = this_pfn;
>>>>           }
>>>>   -        cur_base += ret << PAGE_SHIFT;
>>>> -        npages -= ret;
>>>> +        cur_base += pinned << PAGE_SHIFT;
>>>> +        npages -= pinned;
>>>>       }
>>>>         /* Pin the rest chunk */
>>>> @@ -660,10 +687,26 @@ static int 
>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>                    map_pfn << PAGE_SHIFT, msg->perm);
>>>>   out:
>>>>       if (ret) {
>>>> +        if (nchunks && last_pfn) {
>>>
>>>
>>> Any reason for checking last_pfn here?
>>>
>>> Note that we did:
>>>
>>> +        nchunks++;
>>>
>>>          if (!last_pfn)
>>>              map_pfn = page_to_pfn(page_list[0]);
>> It's for explicit coding to make sure this common error path can be 
>> reused no matter if last_pfn has a sane value assigned or not. I can 
>> change it to an implicit WARN_ON() if need be.
>
>
> Just to make sure I understand. A question, when will we get nchunks 
> != 0 but last_pfn == 0?
The current code has implicit assumption that nchunks != 0 infers 
last_pfn != 0. However, this assumption could break subject to code 
structure changes for eg. failure may occur after the increment of 
nchunks and before the for loop. I feel it'd be the best to capture this 
assumption with something explicit.

-Siwei

>
> Thanks
>
>
>> Thanks,
>> -Siwei
>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>> +            unsigned long pfn;
>>>> +
>>>> +            /*
>>>> +             * Unpin the outstanding pages which are yet to be
>>>> +             * mapped but haven't due to vdpa_map() or
>>>> +             * pin_user_pages() failure.
>>>> +             *
>>>> +             * Mapped pages are accounted in vdpa_map(), hence
>>>> +             * the corresponding unpinning will be handled by
>>>> +             * vdpa_unmap().
>>>> +             */
>>>> +            for (pfn = map_pfn; pfn <= last_pfn; pfn++)
>>>> +                unpin_user_page(pfn_to_page(pfn));
>>>> +        }
>>>>           vhost_vdpa_unmap(v, msg->iova, msg->size);
>>>> -        atomic64_sub(npages, &dev->mm->pinned_vm);
>>>>       }
>>>> +unlock:
>>>>       mmap_read_unlock(dev->mm);
>>>> +free:
>>>>       free_page((unsigned long)page_list);
>>>>       return ret;
>>>>   }
>>>
>>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] vhost-vdpa: fix page pinning leakage in error path (rework)
  2020-11-09 21:44       ` si-wei liu
@ 2020-11-09 22:42         ` Michael S. Tsirkin
  2020-11-09 23:56           ` si-wei liu
  0 siblings, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2020-11-09 22:42 UTC (permalink / raw)
  To: si-wei liu
  Cc: Jason Wang, lingshan.zhu, joao.m.martins, boris.ostrovsky,
	linux-kernel, virtualization

On Mon, Nov 09, 2020 at 01:44:03PM -0800, si-wei liu wrote:
> 
> On 11/8/2020 7:21 PM, Jason Wang wrote:
> > 
> > On 2020/11/6 上午6:57, si-wei liu wrote:
> > > 
> > > On 11/4/2020 7:26 PM, Jason Wang wrote:
> > > > 
> > > > On 2020/11/5 上午7:33, Si-Wei Liu wrote:
> > > > > Pinned pages are not properly accounted particularly when
> > > > > mapping error occurs on IOTLB update. Clean up dangling
> > > > > pinned pages for the error path.
> > > > > 
> > > > > The memory usage for bookkeeping pinned pages is reverted
> > > > > to what it was before: only one single free page is needed.
> > > > > This helps reduce the host memory demand for VM with a large
> > > > > amount of memory, or in the situation where host is running
> > > > > short of free memory.
> > > > > 
> > > > > Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
> > > > > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > ---
> > > > > Changes in v2:
> > > > > - Drop the reversion patch
> > > > > - Fix unhandled page leak towards the end of page_list
> > > > > 
> > > > >   drivers/vhost/vdpa.c | 79
> > > > > ++++++++++++++++++++++++++++++++++++++++------------
> > > > >   1 file changed, 61 insertions(+), 18 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > > index b6d9016..e112854 100644
> > > > > --- a/drivers/vhost/vdpa.c
> > > > > +++ b/drivers/vhost/vdpa.c
> > > > > @@ -560,6 +560,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
> > > > >         if (r)
> > > > >           vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
> > > > > +    else
> > > > > +        atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm);
> > > > >         return r;
> > > > >   }
> > > > > @@ -591,14 +593,16 @@ static int
> > > > > vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
> > > > >       unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
> > > > >       unsigned int gup_flags = FOLL_LONGTERM;
> > > > >       unsigned long npages, cur_base, map_pfn, last_pfn = 0;
> > > > > -    unsigned long locked, lock_limit, pinned, i;
> > > > > +    unsigned long lock_limit, sz2pin, nchunks, i;
> > > > >       u64 iova = msg->iova;
> > > > > +    long pinned;
> > > > >       int ret = 0;
> > > > >         if (vhost_iotlb_itree_first(iotlb, msg->iova,
> > > > >                       msg->iova + msg->size - 1))
> > > > >           return -EEXIST;
> > > > >   +    /* Limit the use of memory for bookkeeping */
> > > > >       page_list = (struct page **) __get_free_page(GFP_KERNEL);
> > > > >       if (!page_list)
> > > > >           return -ENOMEM;
> > > > > @@ -607,52 +611,75 @@ static int
> > > > > vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
> > > > >           gup_flags |= FOLL_WRITE;
> > > > >         npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK))
> > > > > >> PAGE_SHIFT;
> > > > > -    if (!npages)
> > > > > -        return -EINVAL;
> > > > > +    if (!npages) {
> > > > > +        ret = -EINVAL;
> > > > > +        goto free;
> > > > > +    }
> > > > >         mmap_read_lock(dev->mm);
> > > > >   -    locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
> > > > >       lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > > > > -
> > > > > -    if (locked > lock_limit) {
> > > > > +    if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
> > > > >           ret = -ENOMEM;
> > > > > -        goto out;
> > > > > +        goto unlock;
> > > > >       }
> > > > >         cur_base = msg->uaddr & PAGE_MASK;
> > > > >       iova &= PAGE_MASK;
> > > > > +    nchunks = 0;
> > > > >         while (npages) {
> > > > > -        pinned = min_t(unsigned long, npages, list_size);
> > > > > -        ret = pin_user_pages(cur_base, pinned,
> > > > > -                     gup_flags, page_list, NULL);
> > > > > -        if (ret != pinned)
> > > > > +        sz2pin = min_t(unsigned long, npages, list_size);
> > > > > +        pinned = pin_user_pages(cur_base, sz2pin,
> > > > > +                    gup_flags, page_list, NULL);
> > > > > +        if (sz2pin != pinned) {
> > > > > +            if (pinned < 0) {
> > > > > +                ret = pinned;
> > > > > +            } else {
> > > > > +                unpin_user_pages(page_list, pinned);
> > > > > +                ret = -ENOMEM;
> > > > > +            }
> > > > >               goto out;
> > > > > +        }
> > > > > +        nchunks++;
> > > > >             if (!last_pfn)
> > > > >               map_pfn = page_to_pfn(page_list[0]);
> > > > >   -        for (i = 0; i < ret; i++) {
> > > > > +        for (i = 0; i < pinned; i++) {
> > > > >               unsigned long this_pfn = page_to_pfn(page_list[i]);
> > > > >               u64 csize;
> > > > >                 if (last_pfn && (this_pfn != last_pfn + 1)) {
> > > > >                   /* Pin a contiguous chunk of memory */
> > > > >                   csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
> > > > > -                if (vhost_vdpa_map(v, iova, csize,
> > > > > -                           map_pfn << PAGE_SHIFT,
> > > > > -                           msg->perm))
> > > > > +                ret = vhost_vdpa_map(v, iova, csize,
> > > > > +                             map_pfn << PAGE_SHIFT,
> > > > > +                             msg->perm);
> > > > > +                if (ret) {
> > > > > +                    /*
> > > > > +                     * Unpin the pages that are left unmapped
> > > > > +                     * from this point on in the current
> > > > > +                     * page_list. The remaining outstanding
> > > > > +                     * ones which may stride across several
> > > > > +                     * chunks will be covered in the common
> > > > > +                     * error path subsequently.
> > > > > +                     */
> > > > > +                    unpin_user_pages(&page_list[i],
> > > > > +                             pinned - i);
> > > > 
> > > > 
> > > > Can we simply do last_pfn = this_pfn here?
> > > Nope. They are not contiguous segments of memory. Noted the
> > > conditional (this_pfn != last_pfn + 1) being held here.
> > 
> > 
> > Right.
> > 
> > 
> > > 
> > > > 
> > > > 
> > > > >                       goto out;
> > > > > +                }
> > > > > +
> > > > >                   map_pfn = this_pfn;
> > > > >                   iova += csize;
> > > > > +                nchunks = 0;
> > > > >               }
> > > > >                 last_pfn = this_pfn;
> > > > >           }
> > > > >   -        cur_base += ret << PAGE_SHIFT;
> > > > > -        npages -= ret;
> > > > > +        cur_base += pinned << PAGE_SHIFT;
> > > > > +        npages -= pinned;
> > > > >       }
> > > > >         /* Pin the rest chunk */
> > > > > @@ -660,10 +687,26 @@ static int
> > > > > vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
> > > > >                    map_pfn << PAGE_SHIFT, msg->perm);
> > > > >   out:
> > > > >       if (ret) {
> > > > > +        if (nchunks && last_pfn) {
> > > > 
> > > > 
> > > > Any reason for checking last_pfn here?
> > > > 
> > > > Note that we did:
> > > > 
> > > > +        nchunks++;
> > > > 
> > > >          if (!last_pfn)
> > > >              map_pfn = page_to_pfn(page_list[0]);
> > > It's for explicit coding to make sure this common error path can be
> > > reused no matter if last_pfn has a sane value assigned or not. I can
> > > change it to an implicit WARN_ON() if need be.
> > 
> > 
> > Just to make sure I understand. A question, when will we get nchunks !=
> > 0 but last_pfn == 0?
> The current code has implicit assumption that nchunks != 0 infers last_pfn
> != 0. However, this assumption could break subject to code structure changes
> for eg. failure may occur after the increment of nchunks and before the for
> loop. I feel it'd be the best to capture this assumption with something
> explicit.
> 
> -Siwei

if here isn't really an explicit way to document assumptions,
it's a way to avoid assumptions :)
A way to document assumptions is probably BUG_ON.

> > 
> > Thanks
> > 
> > 
> > > Thanks,
> > > -Siwei
> > > 
> > > > 
> > > > 
> > > > Thanks
> > > > 
> > > > 
> > > > > +            unsigned long pfn;
> > > > > +
> > > > > +            /*
> > > > > +             * Unpin the outstanding pages which are yet to be
> > > > > +             * mapped but haven't due to vdpa_map() or
> > > > > +             * pin_user_pages() failure.
> > > > > +             *
> > > > > +             * Mapped pages are accounted in vdpa_map(), hence
> > > > > +             * the corresponding unpinning will be handled by
> > > > > +             * vdpa_unmap().
> > > > > +             */
> > > > > +            for (pfn = map_pfn; pfn <= last_pfn; pfn++)
> > > > > +                unpin_user_page(pfn_to_page(pfn));
> > > > > +        }
> > > > >           vhost_vdpa_unmap(v, msg->iova, msg->size);
> > > > > -        atomic64_sub(npages, &dev->mm->pinned_vm);
> > > > >       }
> > > > > +unlock:
> > > > >       mmap_read_unlock(dev->mm);
> > > > > +free:
> > > > >       free_page((unsigned long)page_list);
> > > > >       return ret;
> > > > >   }
> > > > 
> > > 
> > 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] vhost-vdpa: fix page pinning leakage in error path (rework)
  2020-11-09 22:42         ` Michael S. Tsirkin
@ 2020-11-09 23:56           ` si-wei liu
  2020-11-10  3:27             ` Jason Wang
  0 siblings, 1 reply; 8+ messages in thread
From: si-wei liu @ 2020-11-09 23:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, lingshan.zhu, joao.m.martins, boris.ostrovsky,
	linux-kernel, virtualization


On 11/9/2020 2:42 PM, Michael S. Tsirkin wrote:
> On Mon, Nov 09, 2020 at 01:44:03PM -0800, si-wei liu wrote:
>> On 11/8/2020 7:21 PM, Jason Wang wrote:
>>> On 2020/11/6 上午6:57, si-wei liu wrote:
>>>> On 11/4/2020 7:26 PM, Jason Wang wrote:
>>>>> On 2020/11/5 上午7:33, Si-Wei Liu wrote:
>>>>>> Pinned pages are not properly accounted particularly when
>>>>>> mapping error occurs on IOTLB update. Clean up dangling
>>>>>> pinned pages for the error path.
>>>>>>
>>>>>> The memory usage for bookkeeping pinned pages is reverted
>>>>>> to what it was before: only one single free page is needed.
>>>>>> This helps reduce the host memory demand for VM with a large
>>>>>> amount of memory, or in the situation where host is running
>>>>>> short of free memory.
>>>>>>
>>>>>> Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>> ---
>>>>>> Changes in v2:
>>>>>> - Drop the reversion patch
>>>>>> - Fix unhandled page leak towards the end of page_list
>>>>>>
>>>>>>    drivers/vhost/vdpa.c | 79
>>>>>> ++++++++++++++++++++++++++++++++++++++++------------
>>>>>>    1 file changed, 61 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>>>>>> index b6d9016..e112854 100644
>>>>>> --- a/drivers/vhost/vdpa.c
>>>>>> +++ b/drivers/vhost/vdpa.c
>>>>>> @@ -560,6 +560,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
>>>>>>          if (r)
>>>>>>            vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
>>>>>> +    else
>>>>>> +        atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm);
>>>>>>          return r;
>>>>>>    }
>>>>>> @@ -591,14 +593,16 @@ static int
>>>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>>>        unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
>>>>>>        unsigned int gup_flags = FOLL_LONGTERM;
>>>>>>        unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>>>>>> -    unsigned long locked, lock_limit, pinned, i;
>>>>>> +    unsigned long lock_limit, sz2pin, nchunks, i;
>>>>>>        u64 iova = msg->iova;
>>>>>> +    long pinned;
>>>>>>        int ret = 0;
>>>>>>          if (vhost_iotlb_itree_first(iotlb, msg->iova,
>>>>>>                        msg->iova + msg->size - 1))
>>>>>>            return -EEXIST;
>>>>>>    +    /* Limit the use of memory for bookkeeping */
>>>>>>        page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>>>>>        if (!page_list)
>>>>>>            return -ENOMEM;
>>>>>> @@ -607,52 +611,75 @@ static int
>>>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>>>            gup_flags |= FOLL_WRITE;
>>>>>>          npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK))
>>>>>>>> PAGE_SHIFT;
>>>>>> -    if (!npages)
>>>>>> -        return -EINVAL;
>>>>>> +    if (!npages) {
>>>>>> +        ret = -EINVAL;
>>>>>> +        goto free;
>>>>>> +    }
>>>>>>          mmap_read_lock(dev->mm);
>>>>>>    -    locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
>>>>>>        lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>>>>>> -
>>>>>> -    if (locked > lock_limit) {
>>>>>> +    if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
>>>>>>            ret = -ENOMEM;
>>>>>> -        goto out;
>>>>>> +        goto unlock;
>>>>>>        }
>>>>>>          cur_base = msg->uaddr & PAGE_MASK;
>>>>>>        iova &= PAGE_MASK;
>>>>>> +    nchunks = 0;
>>>>>>          while (npages) {
>>>>>> -        pinned = min_t(unsigned long, npages, list_size);
>>>>>> -        ret = pin_user_pages(cur_base, pinned,
>>>>>> -                     gup_flags, page_list, NULL);
>>>>>> -        if (ret != pinned)
>>>>>> +        sz2pin = min_t(unsigned long, npages, list_size);
>>>>>> +        pinned = pin_user_pages(cur_base, sz2pin,
>>>>>> +                    gup_flags, page_list, NULL);
>>>>>> +        if (sz2pin != pinned) {
>>>>>> +            if (pinned < 0) {
>>>>>> +                ret = pinned;
>>>>>> +            } else {
>>>>>> +                unpin_user_pages(page_list, pinned);
>>>>>> +                ret = -ENOMEM;
>>>>>> +            }
>>>>>>                goto out;
>>>>>> +        }
>>>>>> +        nchunks++;
>>>>>>              if (!last_pfn)
>>>>>>                map_pfn = page_to_pfn(page_list[0]);
>>>>>>    -        for (i = 0; i < ret; i++) {
>>>>>> +        for (i = 0; i < pinned; i++) {
>>>>>>                unsigned long this_pfn = page_to_pfn(page_list[i]);
>>>>>>                u64 csize;
>>>>>>                  if (last_pfn && (this_pfn != last_pfn + 1)) {
>>>>>>                    /* Pin a contiguous chunk of memory */
>>>>>>                    csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
>>>>>> -                if (vhost_vdpa_map(v, iova, csize,
>>>>>> -                           map_pfn << PAGE_SHIFT,
>>>>>> -                           msg->perm))
>>>>>> +                ret = vhost_vdpa_map(v, iova, csize,
>>>>>> +                             map_pfn << PAGE_SHIFT,
>>>>>> +                             msg->perm);
>>>>>> +                if (ret) {
>>>>>> +                    /*
>>>>>> +                     * Unpin the pages that are left unmapped
>>>>>> +                     * from this point on in the current
>>>>>> +                     * page_list. The remaining outstanding
>>>>>> +                     * ones which may stride across several
>>>>>> +                     * chunks will be covered in the common
>>>>>> +                     * error path subsequently.
>>>>>> +                     */
>>>>>> +                    unpin_user_pages(&page_list[i],
>>>>>> +                             pinned - i);
>>>>>
>>>>> Can we simply do last_pfn = this_pfn here?
>>>> Nope. They are not contiguous segments of memory. Noted the
>>>> conditional (this_pfn != last_pfn + 1) being held here.
>>>
>>> Right.
>>>
>>>
>>>>>
>>>>>>                        goto out;
>>>>>> +                }
>>>>>> +
>>>>>>                    map_pfn = this_pfn;
>>>>>>                    iova += csize;
>>>>>> +                nchunks = 0;
>>>>>>                }
>>>>>>                  last_pfn = this_pfn;
>>>>>>            }
>>>>>>    -        cur_base += ret << PAGE_SHIFT;
>>>>>> -        npages -= ret;
>>>>>> +        cur_base += pinned << PAGE_SHIFT;
>>>>>> +        npages -= pinned;
>>>>>>        }
>>>>>>          /* Pin the rest chunk */
>>>>>> @@ -660,10 +687,26 @@ static int
>>>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>>>                     map_pfn << PAGE_SHIFT, msg->perm);
>>>>>>    out:
>>>>>>        if (ret) {
>>>>>> +        if (nchunks && last_pfn) {
>>>>>
>>>>> Any reason for checking last_pfn here?
>>>>>
>>>>> Note that we did:
>>>>>
>>>>> +        nchunks++;
>>>>>
>>>>>           if (!last_pfn)
>>>>>               map_pfn = page_to_pfn(page_list[0]);
>>>> It's for explicit coding to make sure this common error path can be
>>>> reused no matter if last_pfn has a sane value assigned or not. I can
>>>> change it to an implicit WARN_ON() if need be.
>>>
>>> Just to make sure I understand. A question, when will we get nchunks !=
>>> 0 but last_pfn == 0?
>> The current code has implicit assumption that nchunks != 0 infers last_pfn
>> != 0. However, this assumption could break subject to code structure changes
>> for eg. failure may occur after the increment of nchunks and before the for
>> loop. I feel it'd be the best to capture this assumption with something
>> explicit.
>>
>> -Siwei
> if here isn't really an explicit way to document assumptions,
> it's a way to avoid assumptions :)
Agreed. I was referring to the v3 patch which had turned the defensive 
coding to a WARN_ON().

> A way to document assumptions is probably BUG_ON.
If you're fine with below checkpatch warning I can definitely convert it 
to a BUG_ON:

WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code 
rather than BUG() or BUG_ON()

Let me know if I need to post a v4 for this nit.

Thanks
-Siwei



>
>>> Thanks
>>>
>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> +            unsigned long pfn;
>>>>>> +
>>>>>> +            /*
>>>>>> +             * Unpin the outstanding pages which are yet to be
>>>>>> +             * mapped but haven't due to vdpa_map() or
>>>>>> +             * pin_user_pages() failure.
>>>>>> +             *
>>>>>> +             * Mapped pages are accounted in vdpa_map(), hence
>>>>>> +             * the corresponding unpinning will be handled by
>>>>>> +             * vdpa_unmap().
>>>>>> +             */
>>>>>> +            for (pfn = map_pfn; pfn <= last_pfn; pfn++)
>>>>>> +                unpin_user_page(pfn_to_page(pfn));
>>>>>> +        }
>>>>>>            vhost_vdpa_unmap(v, msg->iova, msg->size);
>>>>>> -        atomic64_sub(npages, &dev->mm->pinned_vm);
>>>>>>        }
>>>>>> +unlock:
>>>>>>        mmap_read_unlock(dev->mm);
>>>>>> +free:
>>>>>>        free_page((unsigned long)page_list);
>>>>>>        return ret;
>>>>>>    }


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] vhost-vdpa: fix page pinning leakage in error path (rework)
  2020-11-09 23:56           ` si-wei liu
@ 2020-11-10  3:27             ` Jason Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Wang @ 2020-11-10  3:27 UTC (permalink / raw)
  To: si-wei liu, Michael S. Tsirkin
  Cc: lingshan.zhu, joao.m.martins, boris.ostrovsky, linux-kernel,
	virtualization


On 2020/11/10 上午7:56, si-wei liu wrote:
>
> On 11/9/2020 2:42 PM, Michael S. Tsirkin wrote:
>> On Mon, Nov 09, 2020 at 01:44:03PM -0800, si-wei liu wrote:
>>> On 11/8/2020 7:21 PM, Jason Wang wrote:
>>>> On 2020/11/6 上午6:57, si-wei liu wrote:
>>>>> On 11/4/2020 7:26 PM, Jason Wang wrote:
>>>>>> On 2020/11/5 上午7:33, Si-Wei Liu wrote:
>>>>>>> Pinned pages are not properly accounted particularly when
>>>>>>> mapping error occurs on IOTLB update. Clean up dangling
>>>>>>> pinned pages for the error path.
>>>>>>>
>>>>>>> The memory usage for bookkeeping pinned pages is reverted
>>>>>>> to what it was before: only one single free page is needed.
>>>>>>> This helps reduce the host memory demand for VM with a large
>>>>>>> amount of memory, or in the situation where host is running
>>>>>>> short of free memory.
>>>>>>>
>>>>>>> Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
>>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>>> ---
>>>>>>> Changes in v2:
>>>>>>> - Drop the reversion patch
>>>>>>> - Fix unhandled page leak towards the end of page_list
>>>>>>>
>>>>>>>    drivers/vhost/vdpa.c | 79
>>>>>>> ++++++++++++++++++++++++++++++++++++++++------------
>>>>>>>    1 file changed, 61 insertions(+), 18 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>>>>>>> index b6d9016..e112854 100644
>>>>>>> --- a/drivers/vhost/vdpa.c
>>>>>>> +++ b/drivers/vhost/vdpa.c
>>>>>>> @@ -560,6 +560,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
>>>>>>>          if (r)
>>>>>>>            vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 
>>>>>>> 1);
>>>>>>> +    else
>>>>>>> +        atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm);
>>>>>>>          return r;
>>>>>>>    }
>>>>>>> @@ -591,14 +593,16 @@ static int
>>>>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>>>>        unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
>>>>>>>        unsigned int gup_flags = FOLL_LONGTERM;
>>>>>>>        unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>>>>>>> -    unsigned long locked, lock_limit, pinned, i;
>>>>>>> +    unsigned long lock_limit, sz2pin, nchunks, i;
>>>>>>>        u64 iova = msg->iova;
>>>>>>> +    long pinned;
>>>>>>>        int ret = 0;
>>>>>>>          if (vhost_iotlb_itree_first(iotlb, msg->iova,
>>>>>>>                        msg->iova + msg->size - 1))
>>>>>>>            return -EEXIST;
>>>>>>>    +    /* Limit the use of memory for bookkeeping */
>>>>>>>        page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>>>>>>        if (!page_list)
>>>>>>>            return -ENOMEM;
>>>>>>> @@ -607,52 +611,75 @@ static int
>>>>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>>>>            gup_flags |= FOLL_WRITE;
>>>>>>>          npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK))
>>>>>>>>> PAGE_SHIFT;
>>>>>>> -    if (!npages)
>>>>>>> -        return -EINVAL;
>>>>>>> +    if (!npages) {
>>>>>>> +        ret = -EINVAL;
>>>>>>> +        goto free;
>>>>>>> +    }
>>>>>>>          mmap_read_lock(dev->mm);
>>>>>>>    -    locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
>>>>>>>        lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>>>>>>> -
>>>>>>> -    if (locked > lock_limit) {
>>>>>>> +    if (npages + atomic64_read(&dev->mm->pinned_vm) > 
>>>>>>> lock_limit) {
>>>>>>>            ret = -ENOMEM;
>>>>>>> -        goto out;
>>>>>>> +        goto unlock;
>>>>>>>        }
>>>>>>>          cur_base = msg->uaddr & PAGE_MASK;
>>>>>>>        iova &= PAGE_MASK;
>>>>>>> +    nchunks = 0;
>>>>>>>          while (npages) {
>>>>>>> -        pinned = min_t(unsigned long, npages, list_size);
>>>>>>> -        ret = pin_user_pages(cur_base, pinned,
>>>>>>> -                     gup_flags, page_list, NULL);
>>>>>>> -        if (ret != pinned)
>>>>>>> +        sz2pin = min_t(unsigned long, npages, list_size);
>>>>>>> +        pinned = pin_user_pages(cur_base, sz2pin,
>>>>>>> +                    gup_flags, page_list, NULL);
>>>>>>> +        if (sz2pin != pinned) {
>>>>>>> +            if (pinned < 0) {
>>>>>>> +                ret = pinned;
>>>>>>> +            } else {
>>>>>>> +                unpin_user_pages(page_list, pinned);
>>>>>>> +                ret = -ENOMEM;
>>>>>>> +            }
>>>>>>>                goto out;
>>>>>>> +        }
>>>>>>> +        nchunks++;
>>>>>>>              if (!last_pfn)
>>>>>>>                map_pfn = page_to_pfn(page_list[0]);
>>>>>>>    -        for (i = 0; i < ret; i++) {
>>>>>>> +        for (i = 0; i < pinned; i++) {
>>>>>>>                unsigned long this_pfn = page_to_pfn(page_list[i]);
>>>>>>>                u64 csize;
>>>>>>>                  if (last_pfn && (this_pfn != last_pfn + 1)) {
>>>>>>>                    /* Pin a contiguous chunk of memory */
>>>>>>>                    csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
>>>>>>> -                if (vhost_vdpa_map(v, iova, csize,
>>>>>>> -                           map_pfn << PAGE_SHIFT,
>>>>>>> -                           msg->perm))
>>>>>>> +                ret = vhost_vdpa_map(v, iova, csize,
>>>>>>> +                             map_pfn << PAGE_SHIFT,
>>>>>>> +                             msg->perm);
>>>>>>> +                if (ret) {
>>>>>>> +                    /*
>>>>>>> +                     * Unpin the pages that are left unmapped
>>>>>>> +                     * from this point on in the current
>>>>>>> +                     * page_list. The remaining outstanding
>>>>>>> +                     * ones which may stride across several
>>>>>>> +                     * chunks will be covered in the common
>>>>>>> +                     * error path subsequently.
>>>>>>> +                     */
>>>>>>> + unpin_user_pages(&page_list[i],
>>>>>>> +                             pinned - i);
>>>>>>
>>>>>> Can we simply do last_pfn = this_pfn here?
>>>>> Nope. They are not contiguous segments of memory. Noted the
>>>>> conditional (this_pfn != last_pfn + 1) being held here.
>>>>
>>>> Right.
>>>>
>>>>
>>>>>>
>>>>>>>                        goto out;
>>>>>>> +                }
>>>>>>> +
>>>>>>>                    map_pfn = this_pfn;
>>>>>>>                    iova += csize;
>>>>>>> +                nchunks = 0;
>>>>>>>                }
>>>>>>>                  last_pfn = this_pfn;
>>>>>>>            }
>>>>>>>    -        cur_base += ret << PAGE_SHIFT;
>>>>>>> -        npages -= ret;
>>>>>>> +        cur_base += pinned << PAGE_SHIFT;
>>>>>>> +        npages -= pinned;
>>>>>>>        }
>>>>>>>          /* Pin the rest chunk */
>>>>>>> @@ -660,10 +687,26 @@ static int
>>>>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>>>>                     map_pfn << PAGE_SHIFT, msg->perm);
>>>>>>>    out:
>>>>>>>        if (ret) {
>>>>>>> +        if (nchunks && last_pfn) {
>>>>>>
>>>>>> Any reason for checking last_pfn here?
>>>>>>
>>>>>> Note that we did:
>>>>>>
>>>>>> +        nchunks++;
>>>>>>
>>>>>>           if (!last_pfn)
>>>>>>               map_pfn = page_to_pfn(page_list[0]);
>>>>> It's for explicit coding to make sure this common error path can be
>>>>> reused no matter if last_pfn has a sane value assigned or not. I can
>>>>> change it to an implicit WARN_ON() if need be.
>>>>
>>>> Just to make sure I understand. A question, when will we get 
>>>> nchunks !=
>>>> 0 but last_pfn == 0?
>>> The current code has implicit assumption that nchunks != 0 infers 
>>> last_pfn
>>> != 0. However, this assumption could break subject to code structure 
>>> changes
>>> for eg. failure may occur after the increment of nchunks and before 
>>> the for
>>> loop. I feel it'd be the best to capture this assumption with something
>>> explicit.
>>>
>>> -Siwei
>> if here isn't really an explicit way to document assumptions,
>> it's a way to avoid assumptions :)
> Agreed. I was referring to the v3 patch which had turned the defensive 
> coding to a WARN_ON().


I think I get you now.


>
>> A way to document assumptions is probably BUG_ON.
> If you're fine with below checkpatch warning I can definitely convert 
> it to a BUG_ON:
>
> WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code 
> rather than BUG() or BUG_ON()
>
> Let me know if I need to post a v4 for this nit.


I will ack for V3.

Thanks


>
> Thanks
> -Siwei
>
>
>
>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks,
>>>>> -Siwei
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> +            unsigned long pfn;
>>>>>>> +
>>>>>>> +            /*
>>>>>>> +             * Unpin the outstanding pages which are yet to be
>>>>>>> +             * mapped but haven't due to vdpa_map() or
>>>>>>> +             * pin_user_pages() failure.
>>>>>>> +             *
>>>>>>> +             * Mapped pages are accounted in vdpa_map(), hence
>>>>>>> +             * the corresponding unpinning will be handled by
>>>>>>> +             * vdpa_unmap().
>>>>>>> +             */
>>>>>>> +            for (pfn = map_pfn; pfn <= last_pfn; pfn++)
>>>>>>> +                unpin_user_page(pfn_to_page(pfn));
>>>>>>> +        }
>>>>>>>            vhost_vdpa_unmap(v, msg->iova, msg->size);
>>>>>>> -        atomic64_sub(npages, &dev->mm->pinned_vm);
>>>>>>>        }
>>>>>>> +unlock:
>>>>>>>        mmap_read_unlock(dev->mm);
>>>>>>> +free:
>>>>>>>        free_page((unsigned long)page_list);
>>>>>>>        return ret;
>>>>>>>    }
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-10  3:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-04 23:33 [PATCH v2] vhost-vdpa: fix page pinning leakage in error path (rework) Si-Wei Liu
2020-11-05  3:26 ` Jason Wang
2020-11-05 22:57   ` si-wei liu
2020-11-09  3:21     ` Jason Wang
2020-11-09 21:44       ` si-wei liu
2020-11-09 22:42         ` Michael S. Tsirkin
2020-11-09 23:56           ` si-wei liu
2020-11-10  3:27             ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).