iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] iommu/iova: Add rbtree entry helper
@ 2021-03-05 16:35 Robin Murphy
  2021-03-05 16:35 ` [PATCH 2/2] iommu/iova: Improve restart logic Robin Murphy
  2021-03-18 10:01 ` [PATCH 1/2] iommu/iova: Add rbtree entry helper Joerg Roedel
  0 siblings, 2 replies; 7+ messages in thread
From: Robin Murphy @ 2021-03-05 16:35 UTC (permalink / raw)
  To: joro; +Cc: vjitta, iommu, linux-kernel

Repeating the rb_entry() boilerplate all over the place gets old fast.
Before adding yet more instances, add a little hepler to tidy it up.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/iova.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index e6e2fa85271c..c28003e1d2ee 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -27,6 +27,11 @@ static void fq_destroy_all_entries(struct iova_domain *iovad);
 static void fq_flush_timeout(struct timer_list *t);
 static void free_global_cached_iovas(struct iova_domain *iovad);
 
+static struct iova *to_iova(struct rb_node *node)
+{
+	return rb_entry(node, struct iova, node);
+}
+
 void
 init_iova_domain(struct iova_domain *iovad, unsigned long granule,
 	unsigned long start_pfn)
@@ -136,7 +141,7 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
 {
 	struct iova *cached_iova;
 
-	cached_iova = rb_entry(iovad->cached32_node, struct iova, node);
+	cached_iova = to_iova(iovad->cached32_node);
 	if (free == cached_iova ||
 	    (free->pfn_hi < iovad->dma_32bit_pfn &&
 	     free->pfn_lo >= cached_iova->pfn_lo)) {
@@ -144,7 +149,7 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
 		iovad->max32_alloc_size = iovad->dma_32bit_pfn;
 	}
 
-	cached_iova = rb_entry(iovad->cached_node, struct iova, node);
+	cached_iova = to_iova(iovad->cached_node);
 	if (free->pfn_lo >= cached_iova->pfn_lo)
 		iovad->cached_node = rb_next(&free->node);
 }
@@ -159,7 +164,7 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
 	new = (start) ? &start : &(root->rb_node);
 	/* Figure out where to put new node */
 	while (*new) {
-		struct iova *this = rb_entry(*new, struct iova, node);
+		struct iova *this = to_iova(*new);
 
 		parent = *new;
 
@@ -198,7 +203,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
 		goto iova32_full;
 
 	curr = __get_cached_rbnode(iovad, limit_pfn);
-	curr_iova = rb_entry(curr, struct iova, node);
+	curr_iova = to_iova(curr);
 	retry_pfn = curr_iova->pfn_hi + 1;
 
 retry:
@@ -207,7 +212,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
 		new_pfn = (high_pfn - size) & align_mask;
 		prev = curr;
 		curr = rb_prev(curr);
-		curr_iova = rb_entry(curr, struct iova, node);
+		curr_iova = to_iova(curr);
 	} while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
 
 	if (high_pfn < size || new_pfn < low_pfn) {
@@ -215,7 +220,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
 			high_pfn = limit_pfn;
 			low_pfn = retry_pfn;
 			curr = &iovad->anchor.node;
-			curr_iova = rb_entry(curr, struct iova, node);
+			curr_iova = to_iova(curr);
 			goto retry;
 		}
 		iovad->max32_alloc_size = size;
@@ -331,7 +336,7 @@ private_find_iova(struct iova_domain *iovad, unsigned long pfn)
 	assert_spin_locked(&iovad->iova_rbtree_lock);
 
 	while (node) {
-		struct iova *iova = rb_entry(node, struct iova, node);
+		struct iova *iova = to_iova(node);
 
 		if (pfn < iova->pfn_lo)
 			node = node->rb_left;
@@ -617,7 +622,7 @@ static int
 __is_range_overlap(struct rb_node *node,
 	unsigned long pfn_lo, unsigned long pfn_hi)
 {
-	struct iova *iova = rb_entry(node, struct iova, node);
+	struct iova *iova = to_iova(node);
 
 	if ((pfn_lo <= iova->pfn_hi) && (pfn_hi >= iova->pfn_lo))
 		return 1;
@@ -685,7 +690,7 @@ reserve_iova(struct iova_domain *iovad,
 	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
 	for (node = rb_first(&iovad->rbroot); node; node = rb_next(node)) {
 		if (__is_range_overlap(node, pfn_lo, pfn_hi)) {
-			iova = rb_entry(node, struct iova, node);
+			iova = to_iova(node);
 			__adjust_overlap_range(iova, &pfn_lo, &pfn_hi);
 			if ((pfn_lo >= iova->pfn_lo) &&
 				(pfn_hi <= iova->pfn_hi))
-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] iommu/iova: Improve restart logic
  2021-03-05 16:35 [PATCH 1/2] iommu/iova: Add rbtree entry helper Robin Murphy
@ 2021-03-05 16:35 ` Robin Murphy
  2021-03-09 15:55   ` John Garry
  2021-03-18 10:01 ` [PATCH 1/2] iommu/iova: Add rbtree entry helper Joerg Roedel
  1 sibling, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2021-03-05 16:35 UTC (permalink / raw)
  To: joro; +Cc: vjitta, iommu, linux-kernel

When restarting after searching below the cached node fails, resetting
the start point to the anchor node is often overly pessimistic. If
allocations are made with mixed limits - particularly in the case of the
opportunistic 32-bit allocation for PCI devices - this could mean
significant time wasted walking through the whole populated upper range
just to reach the initial limit. We can improve on that by implementing
a proper tree traversal to find the first node above the relevant limit,
and set the exact start point.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/iova.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index c28003e1d2ee..471c48dd71e7 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -154,6 +154,43 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
 		iovad->cached_node = rb_next(&free->node);
 }
 
+static struct rb_node *iova_find_limit(struct iova_domain *iovad, unsigned long limit_pfn)
+{
+	struct rb_node *node, *next;
+	/*
+	 * Ideally what we'd like to judge here is whether limit_pfn is close
+	 * enough to the highest-allocated IOVA that starting the allocation
+	 * walk from the anchor node will be quicker than this initial work to
+	 * find an exact starting point (especially if that ends up being the
+	 * anchor node anyway). This is an incredibly crude approximation which
+	 * only really helps the most likely case, but is at least trivially easy.
+	 */
+	if (limit_pfn > iovad->dma_32bit_pfn)
+		return &iovad->anchor.node;
+
+	node = iovad->rbroot.rb_node;
+	while (to_iova(node)->pfn_hi < limit_pfn)
+		node = node->rb_right;
+
+search_left:
+	while (node->rb_left && to_iova(node->rb_left)->pfn_lo >= limit_pfn)
+		node = node->rb_left;
+
+	if (!node->rb_left)
+		return node;
+
+	next = node->rb_left;
+	while (next->rb_right) {
+		next = next->rb_right;
+		if (to_iova(next)->pfn_lo >= limit_pfn) {
+			node = next;
+			goto search_left;
+		}
+	}
+
+	return node;
+}
+
 /* Insert the iova into domain rbtree by holding writer lock */
 static void
 iova_insert_rbtree(struct rb_root *root, struct iova *iova,
@@ -219,7 +256,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
 		if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
 			high_pfn = limit_pfn;
 			low_pfn = retry_pfn;
-			curr = &iovad->anchor.node;
+			curr = iova_find_limit(iovad, limit_pfn);
 			curr_iova = to_iova(curr);
 			goto retry;
 		}
-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] iommu/iova: Improve restart logic
  2021-03-05 16:35 ` [PATCH 2/2] iommu/iova: Improve restart logic Robin Murphy
@ 2021-03-09 15:55   ` John Garry
       [not found]     ` <d8e80756-a628-3a0d-77ac-1e9df734f1c5@huawei.com>
  0 siblings, 1 reply; 7+ messages in thread
From: John Garry @ 2021-03-09 15:55 UTC (permalink / raw)
  To: Robin Murphy, joro; +Cc: Linuxarm, linux-kernel, iommu, vjitta

On 05/03/2021 16:35, Robin Murphy wrote:

Hi Robin,

> When restarting after searching below the cached node fails, resetting
> the start point to the anchor node is often overly pessimistic. If
> allocations are made with mixed limits - particularly in the case of the
> opportunistic 32-bit allocation for PCI devices - this could mean
> significant time wasted walking through the whole populated upper range
> just to reach the initial limit. 

Right

> We can improve on that by implementing
> a proper tree traversal to find the first node above the relevant limit,
> and set the exact start point.

Thanks for this. However, as mentioned in the other thread, this does 
not help our performance regression Re: commit 4e89dce72521.

And mentioning this "retry" approach again, in case it was missed, from 
my experiment on the affected HW, it also has generally a dreadfully low 
success rate - less than 1% for the 32b range retry. Retry rate is about 
20%. So I am saying from this 20%, less than 1% of those succeed.

Failing faster sounds key.

Thanks,
John

> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>   drivers/iommu/iova.c | 39 ++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index c28003e1d2ee..471c48dd71e7 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -154,6 +154,43 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
>   		iovad->cached_node = rb_next(&free->node);
>   }
>   
> +static struct rb_node *iova_find_limit(struct iova_domain *iovad, unsigned long limit_pfn)
> +{
> +	struct rb_node *node, *next;
> +	/*
> +	 * Ideally what we'd like to judge here is whether limit_pfn is close
> +	 * enough to the highest-allocated IOVA that starting the allocation
> +	 * walk from the anchor node will be quicker than this initial work to
> +	 * find an exact starting point (especially if that ends up being the
> +	 * anchor node anyway). This is an incredibly crude approximation which
> +	 * only really helps the most likely case, but is at least trivially easy.
> +	 */
> +	if (limit_pfn > iovad->dma_32bit_pfn)
> +		return &iovad->anchor.node;
> +
> +	node = iovad->rbroot.rb_node;
> +	while (to_iova(node)->pfn_hi < limit_pfn)
> +		node = node->rb_right;
> +
> +search_left:
> +	while (node->rb_left && to_iova(node->rb_left)->pfn_lo >= limit_pfn)
> +		node = node->rb_left;
> +
> +	if (!node->rb_left)
> +		return node;
> +
> +	next = node->rb_left;
> +	while (next->rb_right) {
> +		next = next->rb_right;
> +		if (to_iova(next)->pfn_lo >= limit_pfn) {
> +			node = next;
> +			goto search_left;
> +		}
> +	}
> +
> +	return node;
> +}
> +
>   /* Insert the iova into domain rbtree by holding writer lock */
>   static void
>   iova_insert_rbtree(struct rb_root *root, struct iova *iova,
> @@ -219,7 +256,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
>   		if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
>   			high_pfn = limit_pfn;
>   			low_pfn = retry_pfn;
> -			curr = &iovad->anchor.node;
> +			curr = iova_find_limit(iovad, limit_pfn);
>   			curr_iova = to_iova(curr);
>   			goto retry;
>   		}
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] iommu/iova: Add rbtree entry helper
  2021-03-05 16:35 [PATCH 1/2] iommu/iova: Add rbtree entry helper Robin Murphy
  2021-03-05 16:35 ` [PATCH 2/2] iommu/iova: Improve restart logic Robin Murphy
@ 2021-03-18 10:01 ` Joerg Roedel
  1 sibling, 0 replies; 7+ messages in thread
From: Joerg Roedel @ 2021-03-18 10:01 UTC (permalink / raw)
  To: Robin Murphy; +Cc: vjitta, iommu, linux-kernel

On Fri, Mar 05, 2021 at 04:35:22PM +0000, Robin Murphy wrote:
> Repeating the rb_entry() boilerplate all over the place gets old fast.
> Before adding yet more instances, add a little hepler to tidy it up.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/iova.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)

Applied both, thanks Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] iommu/iova: Improve restart logic
       [not found]     ` <d8e80756-a628-3a0d-77ac-1e9df734f1c5@huawei.com>
@ 2021-03-18 11:38       ` John Garry
  2021-03-18 13:20         ` Robin Murphy
  0 siblings, 1 reply; 7+ messages in thread
From: John Garry @ 2021-03-18 11:38 UTC (permalink / raw)
  To: Robin Murphy, joro; +Cc: vjitta, iommu, Linuxarm, linux-kernel

On 10/03/2021 17:47, John Garry wrote:
> On 09/03/2021 15:55, John Garry wrote:
>> On 05/03/2021 16:35, Robin Murphy wrote:
>>
>> Hi Robin,
>>
>>> When restarting after searching below the cached node fails, resetting
>>> the start point to the anchor node is often overly pessimistic. If
>>> allocations are made with mixed limits - particularly in the case of the
>>> opportunistic 32-bit allocation for PCI devices - this could mean
>>> significant time wasted walking through the whole populated upper range
>>> just to reach the initial limit. 
>>
>> Right
>>
>>> We can improve on that by implementing
>>> a proper tree traversal to find the first node above the relevant limit,
>>> and set the exact start point.
>>
>> Thanks for this. However, as mentioned in the other thread, this does 
>> not help our performance regression Re: commit 4e89dce72521.
>>
>> And mentioning this "retry" approach again, in case it was missed, 
>> from my experiment on the affected HW, it also has generally a 
>> dreadfully low success rate - less than 1% for the 32b range retry. 
>> Retry rate is about 20%. So I am saying from this 20%, less than 1% of 
>> those succeed.
>>
> 
> So since the retry means that we search through the complete pfn range 
> most of the time (due to poor success rate), we should be able to do a 
> better job at maintaining an accurate max alloc size, by calculating it 
> from the range search, and not relying on max alloc failed or resetting 
> it frequently. Hopefully that would mean that we're smarter about not 
> trying the allocation.

So I tried that out and we seem to be able to scrap back an appreciable 
amount of performance. Maybe 80% of original, with with another change, 
below.

Thanks,
John

> 
> 
>>
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>   drivers/iommu/iova.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 38 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index c28003e1d2ee..471c48dd71e7 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -154,6 +154,43 @@ __cached_rbnode_delete_update(struct iova_domain 
>>> *iovad, struct iova *free)
>>>           iovad->cached_node = rb_next(&free->node);
>>>   }
>>> +static struct rb_node *iova_find_limit(struct iova_domain *iovad, 
>>> unsigned long limit_pfn)
>>> +{
>>> +    struct rb_node *node, *next;
>>> +    /*
>>> +     * Ideally what we'd like to judge here is whether limit_pfn is 
>>> close
>>> +     * enough to the highest-allocated IOVA that starting the 
>>> allocation
>>> +     * walk from the anchor node will be quicker than this initial 
>>> work to
>>> +     * find an exact starting point (especially if that ends up 
>>> being the
>>> +     * anchor node anyway). This is an incredibly crude 
>>> approximation which
>>> +     * only really helps the most likely case, but is at least 
>>> trivially easy.
>>> +     */
>>> +    if (limit_pfn > iovad->dma_32bit_pfn)
>>> +        return &iovad->anchor.node;
>>> +
>>> +    node = iovad->rbroot.rb_node;
>>> +    while (to_iova(node)->pfn_hi < limit_pfn)
>>> +        node = node->rb_right;
>>> +
>>> +search_left:
>>> +    while (node->rb_left && to_iova(node->rb_left)->pfn_lo >= 
>>> limit_pfn)
>>> +        node = node->rb_left;
>>> +
>>> +    if (!node->rb_left)
>>> +        return node;
>>> +
>>> +    next = node->rb_left;
>>> +    while (next->rb_right) {
>>> +        next = next->rb_right;
>>> +        if (to_iova(next)->pfn_lo >= limit_pfn) {
>>> +            node = next;
>>> +            goto search_left;
>>> +        }
>>> +    }
>>> +
>>> +    return node;
>>> +}
>>> +
>>>   /* Insert the iova into domain rbtree by holding writer lock */
>>>   static void
>>>   iova_insert_rbtree(struct rb_root *root, struct iova *iova,
>>> @@ -219,7 +256,7 @@ static int __alloc_and_insert_iova_range(struct 
>>> iova_domain *iovad,
>>>           if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
>>>               high_pfn = limit_pfn;
>>>               low_pfn = retry_pfn;
>>> -            curr = &iovad->anchor.node;
>>> +            curr = iova_find_limit(iovad, limit_pfn);


I see that it is now applied. However, alternatively could we just add a 
zero-length 32b boundary marker node for the 32b pfn restart point?

>>>               curr_iova = to_iova(curr);
>>>               goto retry;
>>>           }
>>>
>>
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>> .
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] iommu/iova: Improve restart logic
  2021-03-18 11:38       ` John Garry
@ 2021-03-18 13:20         ` Robin Murphy
  2021-03-18 16:07           ` John Garry
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2021-03-18 13:20 UTC (permalink / raw)
  To: John Garry, joro; +Cc: vjitta, iommu, Linuxarm, linux-kernel

On 2021-03-18 11:38, John Garry wrote:
> On 10/03/2021 17:47, John Garry wrote:
>> On 09/03/2021 15:55, John Garry wrote:
>>> On 05/03/2021 16:35, Robin Murphy wrote:
>>>
>>> Hi Robin,
>>>
>>>> When restarting after searching below the cached node fails, resetting
>>>> the start point to the anchor node is often overly pessimistic. If
>>>> allocations are made with mixed limits - particularly in the case of 
>>>> the
>>>> opportunistic 32-bit allocation for PCI devices - this could mean
>>>> significant time wasted walking through the whole populated upper range
>>>> just to reach the initial limit. 
>>>
>>> Right
>>>
>>>> We can improve on that by implementing
>>>> a proper tree traversal to find the first node above the relevant 
>>>> limit,
>>>> and set the exact start point.
>>>
>>> Thanks for this. However, as mentioned in the other thread, this does 
>>> not help our performance regression Re: commit 4e89dce72521.
>>>
>>> And mentioning this "retry" approach again, in case it was missed, 
>>> from my experiment on the affected HW, it also has generally a 
>>> dreadfully low success rate - less than 1% for the 32b range retry. 
>>> Retry rate is about 20%. So I am saying from this 20%, less than 1% 
>>> of those succeed.

Well yeah, in your particular case you're allocating from a heavily 
over-contended address space, so much of the time it is genuinely full. 
Plus you're primarily churning one or two sizes of IOVA, so there's a 
high chance that you will either allocate immediately from the cached 
node (after a previous free), or search the whole space and fail. In 
case it was missed, searching only some arbitrary subset of the space 
before giving up is not a good behaviour for an allocator to have in 
general.

>> So since the retry means that we search through the complete pfn range 
>> most of the time (due to poor success rate), we should be able to do a 
>> better job at maintaining an accurate max alloc size, by calculating 
>> it from the range search, and not relying on max alloc failed or 
>> resetting it frequently. Hopefully that would mean that we're smarter 
>> about not trying the allocation.
> 
> So I tried that out and we seem to be able to scrap back an appreciable 
> amount of performance. Maybe 80% of original, with with another change, 
> below.

TBH if you really want to make allocation more efficient I think there 
are more radical changes that would be worth experimenting with, like 
using some form of augmented rbtree to also encode the amount of free 
space under each branch, or representing the free space in its own 
parallel tree, or whether some other structure entirely might be a 
better bet these days.

And if you just want to make your thing acceptably fast, now I'm going 
to say stick a quirk somewhere to force the "forcedac" option on your 
platform ;)

[...]
>>>> @@ -219,7 +256,7 @@ static int __alloc_and_insert_iova_range(struct 
>>>> iova_domain *iovad,
>>>>           if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
>>>>               high_pfn = limit_pfn;
>>>>               low_pfn = retry_pfn;
>>>> -            curr = &iovad->anchor.node;
>>>> +            curr = iova_find_limit(iovad, limit_pfn);
> 
> 
> I see that it is now applied. However, alternatively could we just add a 
> zero-length 32b boundary marker node for the 32b pfn restart point?

That would need special cases all over the place to prevent the marker 
getting merged into reservations or hit by lookups, and at worst break 
the ordering of the tree if a legitimate node straddles the boundary. I 
did consider having the insert/delete routines keep track of yet another 
cached node for whatever's currently the first thing above the 32-bit 
boundary, but I was worried that might be a bit too invasive.

FWIW I'm currently planning to come back to this again when I have a bit 
more time, since the optimum thing to do (modulo replacing the entire 
algorithm...) is actually to make the second part of the search 
*upwards* from the cached node to the limit. Furthermore, to revive my 
arch/arm conversion I think we're realistically going to need a 
compatibility option for bottom-up allocation to avoid too many nasty 
surprises, so I'd like to generalise things to tackle both concerns at once.

Thanks,
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] iommu/iova: Improve restart logic
  2021-03-18 13:20         ` Robin Murphy
@ 2021-03-18 16:07           ` John Garry
  0 siblings, 0 replies; 7+ messages in thread
From: John Garry @ 2021-03-18 16:07 UTC (permalink / raw)
  To: Robin Murphy, joro; +Cc: vjitta, iommu, Linuxarm, linux-kernel

> 
> Well yeah, in your particular case you're allocating from a heavily 
> over-contended address space, so much of the time it is genuinely full. 
> Plus you're primarily churning one or two sizes of IOVA, so there's a 
> high chance that you will either allocate immediately from the cached 
> node (after a previous free), or search the whole space and fail. In 
> case it was missed, searching only some arbitrary subset of the space 
> before giving up is not a good behaviour for an allocator to have in 
> general.
> 
>>> So since the retry means that we search through the complete pfn 
>>> range most of the time (due to poor success rate), we should be able 
>>> to do a better job at maintaining an accurate max alloc size, by 
>>> calculating it from the range search, and not relying on max alloc 
>>> failed or resetting it frequently. Hopefully that would mean that 
>>> we're smarter about not trying the allocation.
>>
>> So I tried that out and we seem to be able to scrap back an 
>> appreciable amount of performance. Maybe 80% of original, with with 
>> another change, below.
> 
> TBH if you really want to make allocation more efficient I think there 
> are more radical changes that would be worth experimenting with, like 
> using some form of augmented rbtree to also encode the amount of free 
> space under each branch, or representing the free space in its own 
> parallel tree, or whether some other structure entirely might be a 
> better bet these days.
> 
> And if you just want to make your thing acceptably fast, now I'm going 
> to say stick a quirk somewhere to force the "forcedac" option on your 
> platform ;)
> 

Easier said than done :)

But still, I'd like to just be able to cache all IOVA sizes for my DMA 
engine, so we should not have to go near the RB tree often.

I have put together a series to allow upper limit of rcache range be 
increased per domain. So naturally that gives better performance than we 
originally had.

I don't want to prejudice the solution by saying what I think of it now, 
so will send it out...


> [...]
>>>>> @@ -219,7 +256,7 @@ static int __alloc_and_insert_iova_range(struct 
>>>>> iova_domain *iovad,
>>>>>           if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
>>>>>               high_pfn = limit_pfn;
>>>>>               low_pfn = retry_pfn;
>>>>> -            curr = &iovad->anchor.node;
>>>>> +            curr = iova_find_limit(iovad, limit_pfn);
>>
>>
>> I see that it is now applied. However, alternatively could we just add 
>> a zero-length 32b boundary marker node for the 32b pfn restart point?
> 
> That would need special cases all over the place to prevent the marker 
> getting merged into reservations or hit by lookups, and at worst break 
> the ordering of the tree if a legitimate node straddles the boundary. I 
> did consider having the insert/delete routines keep track of yet another 
> cached node for whatever's currently the first thing above the 32-bit 
> boundary, but I was worried that might be a bit too invasive.

Yeah, I did think of that. I don't think that it would have too much 
overhead.

> 
> FWIW I'm currently planning to come back to this again when I have a bit 
> more time, since the optimum thing to do (modulo replacing the entire 
> algorithm...) is actually to make the second part of the search 
> *upwards* from the cached node to the limit. Furthermore, to revive my 
> arch/arm conversion I think we're realistically going to need a 
> compatibility option for bottom-up allocation to avoid too many nasty 
> surprises, so I'd like to generalise things to tackle both concerns at 
> once.
> 

Thanks,
John

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-18 16:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-05 16:35 [PATCH 1/2] iommu/iova: Add rbtree entry helper Robin Murphy
2021-03-05 16:35 ` [PATCH 2/2] iommu/iova: Improve restart logic Robin Murphy
2021-03-09 15:55   ` John Garry
     [not found]     ` <d8e80756-a628-3a0d-77ac-1e9df734f1c5@huawei.com>
2021-03-18 11:38       ` John Garry
2021-03-18 13:20         ` Robin Murphy
2021-03-18 16:07           ` John Garry
2021-03-18 10:01 ` [PATCH 1/2] iommu/iova: Add rbtree entry helper Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).