Re: [PATCH] iommu/iova: Retry from last rb tree node if iova search fails

From: Vijayanand Jitta <vjitta@codeaurora.org>
To: Robin Murphy <robin.murphy@arm.com>,
	joro@8bytes.org, iommu@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Cc: vinmenon@codeaurora.org, kernel-team@android.com
Subject: Re: [PATCH] iommu/iova: Retry from last rb tree node if iova search fails
Date: Mon, 11 May 2020 16:44:06 +0530	[thread overview]
Message-ID: <821c666b-ddf8-8b5c-1e8c-69a06ae1c727@codeaurora.org> (raw)
In-Reply-To: <b80fdf37-e635-2d65-c523-8e1d0bd8085b@codeaurora.org>

On 5/9/2020 12:25 AM, Vijayanand Jitta wrote:
> 
> 
> On 5/7/2020 6:54 PM, Robin Murphy wrote:
>> On 2020-05-06 9:01 pm, vjitta@codeaurora.org wrote:
>>> From: Vijayanand Jitta <vjitta@codeaurora.org>
>>>
>>> When ever a new iova alloc request comes iova is always searched
>>> from the cached node and the nodes which are previous to cached
>>> node. So, even if there is free iova space available in the nodes
>>> which are next to the cached node iova allocation can still fail
>>> because of this approach.
>>>
>>> Consider the following sequence of iova alloc and frees on
>>> 1GB of iova space
>>>
>>> 1) alloc - 500MB
>>> 2) alloc - 12MB
>>> 3) alloc - 499MB
>>> 4) free -  12MB which was allocated in step 2
>>> 5) alloc - 13MB
>>>
>>> After the above sequence we will have 12MB of free iova space and
>>> cached node will be pointing to the iova pfn of last alloc of 13MB
>>> which will be the lowest iova pfn of that iova space. Now if we get an
>>> alloc request of 2MB we just search from cached node and then look
>>> for lower iova pfn's for free iova and as they aren't any, iova alloc
>>> fails though there is 12MB of free iova space.
>>
>> Yup, this could definitely do with improving. Unfortunately I think this
>> particular implementation is slightly flawed...
>>
>>> To avoid such iova search failures do a retry from the last rb tree node
>>> when iova search fails, this will search the entire tree and get an iova
>>> if its available
>>>
>>> Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org>
>>> ---
>>>   drivers/iommu/iova.c | 11 +++++++++++
>>>   1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 0e6a953..2985222 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -186,6 +186,7 @@ static int __alloc_and_insert_iova_range(struct
>>> iova_domain *iovad,
>>>       unsigned long flags;
>>>       unsigned long new_pfn;
>>>       unsigned long align_mask = ~0UL;
>>> +    bool retry = false;
>>>         if (size_aligned)
>>>           align_mask <<= fls_long(size - 1);
>>> @@ -198,6 +199,8 @@ static int __alloc_and_insert_iova_range(struct
>>> iova_domain *iovad,
>>>         curr = __get_cached_rbnode(iovad, limit_pfn);
>>>       curr_iova = rb_entry(curr, struct iova, node);
>>> +
>>> +retry_search:
>>>       do {
>>>           limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
>>>           new_pfn = (limit_pfn - size) & align_mask;
>>> @@ -207,6 +210,14 @@ static int __alloc_and_insert_iova_range(struct
>>> iova_domain *iovad,
>>>       } while (curr && new_pfn <= curr_iova->pfn_hi);
>>>         if (limit_pfn < size || new_pfn < iovad->start_pfn) {
>>> +        if (!retry) {
>>> +            curr = rb_last(&iovad->rbroot);
>>
>> Why walk when there's an anchor node there already? However...
>>
>>> +            curr_iova = rb_entry(curr, struct iova, node);
>>> +            limit_pfn = curr_iova->pfn_lo;
>>
>> ...this doesn't look right, as by now we've lost the original limit_pfn
>> supplied by the caller, so are highly likely to allocate beyond the
>> range our caller asked for. In fact AFAICS we'd start allocating from
>> directly directly below the anchor node, beyond the end of the entire
>> address space.
>>
>> The logic I was imagining we want here was something like the rapidly
>> hacked up (and untested) diff below.
>>
>> Thanks,
>> Robin.
>>
> 
> Thanks for your comments ,I have gone through below logic and I see some
> issue with retry check as there could be case where alloc_lo is set to
> some pfn other than start_pfn in that case we don't retry and there can
> still be iova available. I understand its a hacked up version, I can
> work on this.
> 
> But how about we just store limit_pfn and get the node using that and
> retry for once from that node, it would be similar to my patch just
> correcting the curr node and limit_pfn update in retry check. do you see
> any issue with this approach ?
> 
> 
> Thanks,
> Vijay.

I found one issue with my earlier approach, where we search twice from
cached node to the start_pfn, this can be avoided if we store the pfn_hi
of the cached node make this as alloc_lo when we retry. I see the below
diff also does the same, I have posted v2 version of the patch after
going through the comments and the below diff. can you please review that.

Thanks,
Vijay
>> ----->8-----
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index 0e6a9536eca6..3574c19272d6 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -186,6 +186,7 @@ static int __alloc_and_insert_iova_range(struct
>> iova_domain *iovad,
>>         unsigned long flags;
>>         unsigned long new_pfn;
>>         unsigned long align_mask = ~0UL;
>> +       unsigned long alloc_hi, alloc_lo;
>>
>>         if (size_aligned)
>>                 align_mask <<= fls_long(size - 1);
>> @@ -196,17 +197,27 @@ static int __alloc_and_insert_iova_range(struct
>> iova_domain *iovad,
>>                         size >= iovad->max32_alloc_size)
>>                 goto iova32_full;
>>
>> +       alloc_hi = IOVA_ANCHOR;
>> +       alloc_lo = iovad->start_pfn;
>> +retry:
>>         curr = __get_cached_rbnode(iovad, limit_pfn);
>>         curr_iova = rb_entry(curr, struct iova, node);
>> +       if (alloc_hi < curr_iova->pfn_hi) {
>> +               alloc_lo = curr_iova->pfn_hi;
>> +               alloc_hi = limit_pfn;
>> +       }
>> +
>>         do {
>> -               limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
>> -               new_pfn = (limit_pfn - size) & align_mask;
>> +               alloc_hi = min(alloc_hi, curr_iova->pfn_lo);
>> +               new_pfn = (alloc_hi - size) & align_mask;
>>                 prev = curr;
>>                 curr = rb_prev(curr);
>>                 curr_iova = rb_entry(curr, struct iova, node);
>>         } while (curr && new_pfn <= curr_iova->pfn_hi);
>>
>> -       if (limit_pfn < size || new_pfn < iovad->start_pfn) {
>> +       if (limit_pfn < size || new_pfn < alloc_lo) {
>> +               if (alloc_lo == iovad->start_pfn)
>> +                       goto retry;
>>                 iovad->max32_alloc_size = size;
>>                 goto iova32_full;
>>         }