Re: [PATCH v3 2/2] vhost-vdpa: fix page pinning leakage in error path

From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	si-wei liu <si-wei.liu@oracle.com>
Cc: lingshan.zhu@intel.com, joao.m.martins@oracle.com,
	boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org
Subject: Re: [PATCH v3 2/2] vhost-vdpa: fix page pinning leakage in error path
Date: Wed, 14 Oct 2020 19:41:17 +0800	[thread overview]
Message-ID: <06322c3a-24b1-1fc7-6914-57a920271738@redhat.com> (raw)
In-Reply-To: <20201014025025-mutt-send-email-mst@kernel.org>

On 2020/10/14 下午2:52, Michael S. Tsirkin wrote:
> On Tue, Oct 13, 2020 at 04:42:59PM -0700, si-wei liu wrote:
>> On 10/9/2020 7:27 PM, Jason Wang wrote:
>>> On 2020/10/3 下午1:02, Si-Wei Liu wrote:
>>>> Pinned pages are not properly accounted particularly when
>>>> mapping error occurs on IOTLB update. Clean up dangling
>>>> pinned pages for the error path. As the inflight pinned
>>>> pages, specifically for memory region that strides across
>>>> multiple chunks, would need more than one free page for
>>>> book keeping and accounting. For simplicity, pin pages
>>>> for all memory in the IOVA range in one go rather than
>>>> have multiple pin_user_pages calls to make up the entire
>>>> region. This way it's easier to track and account the
>>>> pages already mapped, particularly for clean-up in the
>>>> error path.
>>>>
>>>> Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@oracle.com>
>>>> ---
>>>> Changes in v3:
>>>> - Factor out vhost_vdpa_map() change to a separate patch
>>>>
>>>> Changes in v2:
>>>> - Fix incorrect target SHA1 referenced
>>>>
>>>>    drivers/vhost/vdpa.c | 119
>>>> ++++++++++++++++++++++++++++++---------------------
>>>>    1 file changed, 71 insertions(+), 48 deletions(-)
>>>>
>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>>>> index 0f27919..dad41dae 100644
>>>> --- a/drivers/vhost/vdpa.c
>>>> +++ b/drivers/vhost/vdpa.c
>>>> @@ -595,21 +595,19 @@ static int
>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>        struct vhost_dev *dev = &v->vdev;
>>>>        struct vhost_iotlb *iotlb = dev->iotlb;
>>>>        struct page **page_list;
>>>> -    unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
>>>> +    struct vm_area_struct **vmas;
>>>>        unsigned int gup_flags = FOLL_LONGTERM;
>>>> -    unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>>>> -    unsigned long locked, lock_limit, pinned, i;
>>>> +    unsigned long map_pfn, last_pfn = 0;
>>>> +    unsigned long npages, lock_limit;
>>>> +    unsigned long i, nmap = 0;
>>>>        u64 iova = msg->iova;
>>>> +    long pinned;
>>>>        int ret = 0;
>>>>          if (vhost_iotlb_itree_first(iotlb, msg->iova,
>>>>                        msg->iova + msg->size - 1))
>>>>            return -EEXIST;
>>>>    -    page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>>> -    if (!page_list)
>>>> -        return -ENOMEM;
>>>> -
>>>>        if (msg->perm & VHOST_ACCESS_WO)
>>>>            gup_flags |= FOLL_WRITE;
>>>>    @@ -617,61 +615,86 @@ static int
>>>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>>>>        if (!npages)
>>>>            return -EINVAL;
>>>>    +    page_list = kvmalloc_array(npages, sizeof(struct page *),
>>>> GFP_KERNEL);
>>>> +    vmas = kvmalloc_array(npages, sizeof(struct vm_area_struct *),
>>>> +                  GFP_KERNEL);
>>> This will result high order memory allocation which was what the code
>>> tried to avoid originally.
>>>
>>> Using an unlimited size will cause a lot of side effects consider VM or
>>> userspace may try to pin several TB of memory.
>> Hmmm, that's a good point. Indeed, if the guest memory demand is huge or the
>> host system is running short of free pages, kvmalloc will be problematic and
>> less efficient than the __get_free_page implementation.
> OK so ... Jason, what's the plan?
>
> How about you send a patchset with
> 1. revert this change
> 2. fix error handling leak

Work for me, but it looks like siwei want to do this.

So it's better for to send the patchset.

Thanks

>
>