linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented
@ 2022-08-24  9:51 Peng Zhang
  2022-08-24 12:30 ` Ethan Zhao
  2022-09-21  1:15 ` wangjie (L)
  0 siblings, 2 replies; 10+ messages in thread
From: Peng Zhang @ 2022-08-24  9:51 UTC (permalink / raw)
  To: joro, will; +Cc: iommu, linux-kernel, robin.murphy, Peng Zhang

The current algorithm of alloc_iova is to scan all iovas until it finds
a gap that satisfies the condition to allocate. This can be very slow in
some scenarios. We can optimize alloc_iova() from time complexity O(n)
to O(log(n)).

We can make a test like this:
Write a module and initialize iova_domain with 4k granule.
Then using a user-mode program to call the module to allocate iova of
size 1 2^20 times within the allocation limit of 2^20. This is single
threaded and the low 4g space is full after 2^20 allocations.

Finally loop the following three steps:
1. Randomly releases an iova.

2. Allocate an iova of size 1 within the allocation limit of 2^20.

3. Allocate an iova of size 1 within the allocation limit of 2^20.
   This will fail and take a very long time, because max32_alloc_size
   is reset whenever an iova is released.

The data below is the result of repeating the three steps 1024 times in
a physical machine with a CPU clocked at 2.30GHz

Before improvement:
Tracing 1 functions for "alloc_iova"...
   nsecs             : count    distbution
     256 -> 511      : 1594    |                                      |
     512 -> 1023     : 1030686 |**************************************|
    1024 -> 2047     : 14661   |                                      |
    2048 -> 4095     : 1730    |                                      |
    4096 -> 8191     : 634     |                                      |
    8192 -> 16383    : 20      |                                      |
   16384 -> 32767    : 2       |                                      |
   32768 -> 65535    : 2       |                                      |
   65536 -> 131071   : 3       |                                      |
  131072 -> 262143   : 6       |                                      |
  262144 -> 524287   : 8       |                                      |
  524288 -> 1048575  : 19      |                                      |
 1048576 -> 2097151  : 35      |                                      |
 2097152 -> 4194303  : 55      |                                      |
 4194304 -> 8388607  : 117     |                                      |
 8388608 -> 16777215 : 165     |                                      |
16777216 -> 33554431 : 1112    |                                      |
avg = 33867 nsecs, total: 35589643563 nsecs, count: 1050849

With improvement:
Tracing 1 functions for "alloc_iova"...
nsecs             : count     distribution
  512 -> 1023     : 1033561  |****************************************|
 1024 -> 2047     : 13631    |                                        |
 2048 -> 4095     : 2981     |                                        |
 4096 -> 8191     : 448      |                                        |
 8192 -> 16383    : 5        |                                        |
16384 -> 32767    : 1        |                                        |
avg = 696 nsecs, total: 732196323 nsecs, count: 1050627

Introduce the improved algorithm:

------------------------------------------------------------------------
| gap1  |iova1| gap2 |iova2| gap3 |iova3|   gap4  |iova4| gap5  |anchor|
------------------------------------------------------------------------

let A = allocatable_size
let B = max_allocatable_size
                    ____________
                  /    iova2     \      B = max( left_child->B,
                 |       A        |              right_child->B,
                  \      B       /               A)
                    ------------
                   /            \
                  /              \
    ____________                    ____________
  /    iova1     \                /    iova4     \
 |       A        |              |       A        |
  \      B       /                \      B        /
    ------------                    ------------
                                   /            \
                                  /              \
                    ____________                    ____________
                  /    iova3     \                /    anchor    \
                 |       A        |              |       A        |
                  \      B       /                \      B        /
                    ------------                    ------------

Define the gap of a iova is the gap between the iova and it's previous
iova. Such as the gap of iova3 is gap3.This gap can be used to allocate.

Add three variables to struct iova.
prev_iova:
         point to the previous iova, sush as iova3->prev_iova point to
         iova2.

allocatable_size:
         allocatable_size is the max size can be allocated from a gap.
         It is not the length of a gap because the allocated address
         may need to be aligned.

max_allocatable_size:
         max_allocatable_size is the max allocatable_size of all iova's
         gap in the subtree.

         max_allocatable_size = max( left_child->max_allocatable_size,
                                     right_child->max_allocatable_size,
                                     allocatable_size)

We can use rbtree_augmented to maintain max_allocatable_size in time
complexity O(log(n)).

In the rbtree, with the max_allocatable_size and allocatable_size,
searching the gap to allocate is fast and the time complexity is
O(log(n)).

Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
---
 drivers/iommu/iova.c | 265 ++++++++++++++++++++++++++++++++-----------
 include/linux/iova.h |   5 +-
 2 files changed, 204 insertions(+), 66 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145..79625ac82560 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -43,6 +43,56 @@ static struct iova *to_iova(struct rb_node *node)
 	return rb_entry(node, struct iova, node);
 }
 
+/*
+ * We can't judge whether it can be allocated only by a given interval length
+ * because the address may be aligned.
+ * This function computes the max allocatable size for a given interval.
+ * The time complexity of this function is O(log(n)).
+ */
+static unsigned long __compute_allocatable_size(unsigned long lo,
+						unsigned long hi)
+{
+	unsigned long allocatable_size = 0;
+
+	if (lo == 0)
+		return hi;
+	while (lo < hi) {
+		unsigned long delta = 1UL << __ffs64(lo);
+
+		if (hi - lo <= delta) {
+			allocatable_size = max(allocatable_size, hi - lo);
+			break;
+		}
+		allocatable_size = max(allocatable_size, delta);
+		lo += delta;
+	}
+	return allocatable_size;
+}
+
+static inline unsigned long prev_iova_high(struct iova *iova)
+{
+	return iova->prev_iova ? iova->prev_iova->pfn_hi + 1 : 0;
+}
+
+static inline unsigned long iova_compute_allocatable_size(struct iova *iova)
+{
+	return __compute_allocatable_size(prev_iova_high(iova), iova->pfn_lo);
+}
+
+static inline unsigned long iova_get_allocatable_size(struct iova *iova)
+{
+	return iova->allocatable_size;
+}
+
+RB_DECLARE_CALLBACKS_MAX(static, iova_gap_callbacks, struct iova, node,
+			 unsigned long, max_allocatable_size,
+			 iova_get_allocatable_size)
+
+static inline void iova_max_allocatable_size_update(struct iova *iova)
+{
+	iova_gap_callbacks_propagate(&iova->node, NULL);
+}
+
 void
 init_iova_domain(struct iova_domain *iovad, unsigned long granule,
 	unsigned long start_pfn)
@@ -63,8 +113,16 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,
 	iovad->dma_32bit_pfn = 1UL << (32 - iova_shift(iovad));
 	iovad->max32_alloc_size = iovad->dma_32bit_pfn;
 	iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
+	iovad->anchor.prev_iova = NULL;
+	iovad->anchor.allocatable_size =
+				__compute_allocatable_size(0, IOVA_ANCHOR);
+	iovad->anchor.max_allocatable_size  = iovad->anchor.allocatable_size;
+
 	rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
 	rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
+
+	if (start_pfn)
+		reserve_iova(iovad, 0, start_pfn - 1);
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
@@ -87,7 +145,8 @@ __cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
 }
 
 static void
-__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
+__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free,
+			      struct rb_node *next)
 {
 	struct iova *cached_iova;
 
@@ -95,51 +154,32 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
 	if (free == cached_iova ||
 	    (free->pfn_hi < iovad->dma_32bit_pfn &&
 	     free->pfn_lo >= cached_iova->pfn_lo))
-		iovad->cached32_node = rb_next(&free->node);
+		iovad->cached32_node = next;
 
 	if (free->pfn_lo < iovad->dma_32bit_pfn)
 		iovad->max32_alloc_size = iovad->dma_32bit_pfn;
 
 	cached_iova = to_iova(iovad->cached_node);
 	if (free->pfn_lo >= cached_iova->pfn_lo)
-		iovad->cached_node = rb_next(&free->node);
+		iovad->cached_node = next;
 }
 
-static struct rb_node *iova_find_limit(struct iova_domain *iovad, unsigned long limit_pfn)
+static struct rb_node *iova_find_limit(struct iova_domain *iovad,
+				       unsigned long limit_pfn)
 {
-	struct rb_node *node, *next;
-	/*
-	 * Ideally what we'd like to judge here is whether limit_pfn is close
-	 * enough to the highest-allocated IOVA that starting the allocation
-	 * walk from the anchor node will be quicker than this initial work to
-	 * find an exact starting point (especially if that ends up being the
-	 * anchor node anyway). This is an incredibly crude approximation which
-	 * only really helps the most likely case, but is at least trivially easy.
-	 */
-	if (limit_pfn > iovad->dma_32bit_pfn)
-		return &iovad->anchor.node;
-
-	node = iovad->rbroot.rb_node;
-	while (to_iova(node)->pfn_hi < limit_pfn)
-		node = node->rb_right;
-
-search_left:
-	while (node->rb_left && to_iova(node->rb_left)->pfn_lo >= limit_pfn)
-		node = node->rb_left;
-
-	if (!node->rb_left)
-		return node;
-
-	next = node->rb_left;
-	while (next->rb_right) {
-		next = next->rb_right;
-		if (to_iova(next)->pfn_lo >= limit_pfn) {
-			node = next;
-			goto search_left;
-		}
-	}
+	struct rb_node *curr = iovad->rbroot.rb_node;
 
-	return node;
+	while (curr) {
+		struct iova *iova = to_iova(curr);
+
+		if (limit_pfn - 1 > iova->pfn_hi)
+			curr = curr->rb_right;
+		else if (limit_pfn <= prev_iova_high(iova))
+			curr = curr->rb_left;
+		else
+			break;
+	}
+	return curr;
 }
 
 /* Insert the iova into domain rbtree by holding writer lock */
@@ -148,6 +188,7 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
 		   struct rb_node *start)
 {
 	struct rb_node **new, *parent = NULL;
+	struct iova *next_iova;
 
 	new = (start) ? &start : &(root->rb_node);
 	/* Figure out where to put new node */
@@ -166,61 +207,143 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
 		}
 	}
 	/* Add new node and rebalance tree. */
+
 	rb_link_node(&iova->node, parent, new);
-	rb_insert_color(&iova->node, root);
+
+	next_iova = to_iova(rb_next(&iova->node));
+	iova->prev_iova = next_iova->prev_iova;
+	next_iova->prev_iova = iova;
+
+	iova->allocatable_size = iova_compute_allocatable_size(iova);
+	next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
+
+	/*
+	 * Do't swap the following two lines, because next_iova is the ancestor
+	 * of iova and updating iova first is faster.
+	 */
+	iova_max_allocatable_size_update(iova);
+	iova_max_allocatable_size_update(next_iova);
+
+	rb_insert_augmented(&iova->node, root, &iova_gap_callbacks);
+}
+
+static inline bool check_interval(unsigned long lo, unsigned long hi,
+				  unsigned long limit_pfn, unsigned long size,
+				  unsigned long align_mask)
+{
+	hi = min(hi, limit_pfn);
+	if (lo >= hi)
+		return false;
+	if (hi >= size && ((hi - size) & align_mask) >= lo)
+		return true;
+	return false;
 }
 
 static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
 		unsigned long size, unsigned long limit_pfn,
 			struct iova *new, bool size_aligned)
 {
-	struct rb_node *curr, *prev;
-	struct iova *curr_iova;
 	unsigned long flags;
-	unsigned long new_pfn, retry_pfn;
+	struct rb_node *curr;
+	struct rb_node *parent;
+	struct iova *curr_iova;
 	unsigned long align_mask = ~0UL;
-	unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
+	bool ignore = false;
 
 	if (size_aligned)
 		align_mask <<= fls_long(size - 1);
 
-	/* Walk the tree backwards */
 	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+
 	if (limit_pfn <= iovad->dma_32bit_pfn &&
 			size >= iovad->max32_alloc_size)
 		goto iova32_full;
 
 	curr = __get_cached_rbnode(iovad, limit_pfn);
 	curr_iova = to_iova(curr);
-	retry_pfn = curr_iova->pfn_hi + 1;
 
-retry:
-	do {
-		high_pfn = min(high_pfn, curr_iova->pfn_lo);
-		new_pfn = (high_pfn - size) & align_mask;
-		prev = curr;
-		curr = rb_prev(curr);
-		curr_iova = to_iova(curr);
-	} while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
-
-	if (high_pfn < size || new_pfn < low_pfn) {
-		if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
-			high_pfn = limit_pfn;
-			low_pfn = retry_pfn;
-			curr = iova_find_limit(iovad, limit_pfn);
-			curr_iova = to_iova(curr);
-			goto retry;
+	if (limit_pfn >= curr_iova->pfn_lo &&
+	    curr_iova->allocatable_size >= size)
+		goto found;
+
+	/* If limit_pfn > dma_32bit_pfn, this could be faster. */
+	if (limit_pfn > iovad->dma_32bit_pfn) {
+		curr_iova = to_iova(&iovad->anchor.node);
+
+		while (curr_iova) {
+			if (check_interval(prev_iova_high(curr_iova),
+					   curr_iova->pfn_lo, limit_pfn,
+					   size, align_mask))
+				goto found;
+			curr_iova = curr_iova->prev_iova;
 		}
 		iovad->max32_alloc_size = size;
 		goto iova32_full;
 	}
 
+	curr = iova_find_limit(iovad, limit_pfn);
+	curr_iova = to_iova(curr);
+
+	if (check_interval(prev_iova_high(curr_iova),
+			   curr_iova->pfn_lo, limit_pfn,
+			   size, align_mask))
+		goto found;
+
+	while (true) {
+		/* Check left subtree */
+		if (!ignore && curr->rb_left) {
+			curr_iova = to_iova(curr->rb_left);
+			if (curr_iova->max_allocatable_size >= size)
+				goto check_subtree;
+		}
+
+		parent = rb_parent(curr);
+		if (parent == NULL)
+			break;
+		/*
+		 * If current node is the left child of it's parent,
+		 * the parent node and the parent's right sub_tree should not
+		 * to be checked because they exceed the limit_pfn.
+		 */
+		ignore = parent->rb_left == curr;
+		curr = parent;
+
+		/* Check current node. */
+		if (!ignore) {
+			curr_iova = to_iova(curr);
+			if (curr_iova->allocatable_size >= size)
+				goto found;
+		}
+	}
+	if (limit_pfn >= iovad->dma_32bit_pfn)
+		iovad->max32_alloc_size = size;
+	goto iova32_full;
+
+check_subtree:
+	while (true) {
+		if (curr_iova->allocatable_size >= size)
+			goto found;
+
+		curr = &curr_iova->node;
+		if (curr->rb_right &&
+			to_iova(curr->rb_right)->max_allocatable_size >= size) {
+			curr_iova = to_iova(curr->rb_right);
+			continue;
+		}
+		WARN_ON(curr->rb_left == NULL);
+		curr_iova = to_iova(curr->rb_left);
+	}
+
+found:
 	/* pfn_lo will point to size aligned address if size_aligned is set */
-	new->pfn_lo = new_pfn;
+	new->pfn_lo = (min(curr_iova->pfn_lo, limit_pfn) - size) & align_mask;
 	new->pfn_hi = new->pfn_lo + size - 1;
 
-	/* If we have 'prev', it's a valid place to start the insertion. */
-	iova_insert_rbtree(&iovad->rbroot, new, prev);
+	/*
+	 * If we have 'prev' or 'next',
+	 * it's a valid place to start the insertion.
+	 */
+	iova_insert_rbtree(&iovad->rbroot, new, &curr_iova->node);
 	__cached_rbnode_insert_update(iovad, new);
 
 	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
@@ -352,9 +475,18 @@ private_find_iova(struct iova_domain *iovad, unsigned long pfn)
 
 static void remove_iova(struct iova_domain *iovad, struct iova *iova)
 {
+	struct rb_node *next;
+	struct iova *next_iova;
 	assert_spin_locked(&iovad->iova_rbtree_lock);
-	__cached_rbnode_delete_update(iovad, iova);
-	rb_erase(&iova->node, &iovad->rbroot);
+
+	next = rb_next(&iova->node);
+	__cached_rbnode_delete_update(iovad, iova, next);
+
+	next_iova = to_iova(next);
+	next_iova->prev_iova = iova->prev_iova;
+	next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
+	iova_max_allocatable_size_update(next_iova);
+	rb_erase_augmented(&iova->node, &iovad->rbroot, &iova_gap_callbacks);
 }
 
 /**
@@ -554,8 +686,11 @@ static void
 __adjust_overlap_range(struct iova *iova,
 	unsigned long *pfn_lo, unsigned long *pfn_hi)
 {
-	if (*pfn_lo < iova->pfn_lo)
+	if (*pfn_lo < iova->pfn_lo) {
 		iova->pfn_lo = *pfn_lo;
+		iova->allocatable_size = iova_compute_allocatable_size(iova);
+		iova_max_allocatable_size_update(iova);
+	}
 	if (*pfn_hi > iova->pfn_hi)
 		*pfn_lo = iova->pfn_hi + 1;
 }
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 320a70e40233..feb8121f104d 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -11,7 +11,7 @@
 
 #include <linux/types.h>
 #include <linux/kernel.h>
-#include <linux/rbtree.h>
+#include <linux/rbtree_augmented.h>
 #include <linux/dma-mapping.h>
 
 /* iova structure */
@@ -19,6 +19,9 @@ struct iova {
 	struct rb_node	node;
 	unsigned long	pfn_hi; /* Highest allocated pfn */
 	unsigned long	pfn_lo; /* Lowest allocated pfn */
+	struct iova	*prev_iova;
+	unsigned long	allocatable_size;
+	unsigned long	max_allocatable_size;
 };
 
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented
  2022-08-24  9:51 [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented Peng Zhang
@ 2022-08-24 12:30 ` Ethan Zhao
  2022-08-25  8:10   ` [External] " Peng Zhang
  2022-09-21  1:15 ` wangjie (L)
  1 sibling, 1 reply; 10+ messages in thread
From: Ethan Zhao @ 2022-08-24 12:30 UTC (permalink / raw)
  To: Peng Zhang, joro, will; +Cc: iommu, linux-kernel, robin.murphy

Hi,

在 2022/8/24 17:51, Peng Zhang 写道:
> The current algorithm of alloc_iova is to scan all iovas until it finds
> a gap that satisfies the condition to allocate. This can be very slow in
> some scenarios. We can optimize alloc_iova() from time complexity O(n)
What kind of scenarion for example ?
> to O(log(n)).
>
> We can make a test like this:
> Write a module and initialize iova_domain with 4k granule.
> Then using a user-mode program to call the module to allocate iova of
> size 1 2^20 times within the allocation limit of 2^20. This is single
> threaded and the low 4g space is full after 2^20 allocations.
>
> Finally loop the following three steps:
> 1. Randomly releases an iova.
>
> 2. Allocate an iova of size 1 within the allocation limit of 2^20.
>
> 3. Allocate an iova of size 1 within the allocation limit of 2^20.
>     This will fail and take a very long time, because max32_alloc_size
>     is reset whenever an iova is released.
>
> The data below is the result of repeating the three steps 1024 times in
> a physical machine with a CPU clocked at 2.30GHz
>
> Before improvement:
> Tracing 1 functions for "alloc_iova"...

Out of curiosity, do you have number about "alloc_iova + free_iova" and

"alloc_iova_fast + free_iova_fast" ?

>     nsecs             : count    distbution

s/distbution/distribution ?


Thanks,

Ethan

>       256 -> 511      : 1594    |                                      |
>       512 -> 1023     : 1030686 |**************************************|
>      1024 -> 2047     : 14661   |                                      |
>      2048 -> 4095     : 1730    |                                      |
>      4096 -> 8191     : 634     |                                      |
>      8192 -> 16383    : 20      |                                      |
>     16384 -> 32767    : 2       |                                      |
>     32768 -> 65535    : 2       |                                      |
>     65536 -> 131071   : 3       |                                      |
>    131072 -> 262143   : 6       |                                      |
>    262144 -> 524287   : 8       |                                      |
>    524288 -> 1048575  : 19      |                                      |
>   1048576 -> 2097151  : 35      |                                      |
>   2097152 -> 4194303  : 55      |                                      |
>   4194304 -> 8388607  : 117     |                                      |
>   8388608 -> 16777215 : 165     |                                      |
> 16777216 -> 33554431 : 1112    |                                      |
> avg = 33867 nsecs, total: 35589643563 nsecs, count: 1050849
>
> With improvement:
> Tracing 1 functions for "alloc_iova"...
> nsecs             : count     distribution
>    512 -> 1023     : 1033561  |****************************************|
>   1024 -> 2047     : 13631    |                                        |
>   2048 -> 4095     : 2981     |                                        |
>   4096 -> 8191     : 448      |                                        |
>   8192 -> 16383    : 5        |                                        |
> 16384 -> 32767    : 1        |                                        |
> avg = 696 nsecs, total: 732196323 nsecs, count: 1050627
>
> Introduce the improved algorithm:
>
> ------------------------------------------------------------------------
> | gap1  |iova1| gap2 |iova2| gap3 |iova3|   gap4  |iova4| gap5  |anchor|
> ------------------------------------------------------------------------
>
> let A = allocatable_size
> let B = max_allocatable_size
>                      ____________
>                    /    iova2     \      B = max( left_child->B,
>                   |       A        |              right_child->B,
>                    \      B       /               A)
>                      ------------
>                     /            \
>                    /              \
>      ____________                    ____________
>    /    iova1     \                /    iova4     \
>   |       A        |              |       A        |
>    \      B       /                \      B        /
>      ------------                    ------------
>                                     /            \
>                                    /              \
>                      ____________                    ____________
>                    /    iova3     \                /    anchor    \
>                   |       A        |              |       A        |
>                    \      B       /                \      B        /
>                      ------------                    ------------
>
> Define the gap of a iova is the gap between the iova and it's previous
> iova. Such as the gap of iova3 is gap3.This gap can be used to allocate.
>
> Add three variables to struct iova.
> prev_iova:
>           point to the previous iova, sush as iova3->prev_iova point to
>           iova2.
>
> allocatable_size:
>           allocatable_size is the max size can be allocated from a gap.
>           It is not the length of a gap because the allocated address
>           may need to be aligned.
>
> max_allocatable_size:
>           max_allocatable_size is the max allocatable_size of all iova's
>           gap in the subtree.
>
>           max_allocatable_size = max( left_child->max_allocatable_size,
>                                       right_child->max_allocatable_size,
>                                       allocatable_size)
>
> We can use rbtree_augmented to maintain max_allocatable_size in time
> complexity O(log(n)).
>
> In the rbtree, with the max_allocatable_size and allocatable_size,
> searching the gap to allocate is fast and the time complexity is
> O(log(n)).
>
> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
> ---
>   drivers/iommu/iova.c | 265 ++++++++++++++++++++++++++++++++-----------
>   include/linux/iova.h |   5 +-
>   2 files changed, 204 insertions(+), 66 deletions(-)
>
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index db77aa675145..79625ac82560 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -43,6 +43,56 @@ static struct iova *to_iova(struct rb_node *node)
>   	return rb_entry(node, struct iova, node);
>   }
>   
> +/*
> + * We can't judge whether it can be allocated only by a given interval length
> + * because the address may be aligned.
> + * This function computes the max allocatable size for a given interval.
> + * The time complexity of this function is O(log(n)).
> + */
> +static unsigned long __compute_allocatable_size(unsigned long lo,
> +						unsigned long hi)
> +{
> +	unsigned long allocatable_size = 0;
> +
> +	if (lo == 0)
> +		return hi;
> +	while (lo < hi) {
> +		unsigned long delta = 1UL << __ffs64(lo);
> +
> +		if (hi - lo <= delta) {
> +			allocatable_size = max(allocatable_size, hi - lo);
> +			break;
> +		}
> +		allocatable_size = max(allocatable_size, delta);
> +		lo += delta;
> +	}
> +	return allocatable_size;
> +}
> +
> +static inline unsigned long prev_iova_high(struct iova *iova)
> +{
> +	return iova->prev_iova ? iova->prev_iova->pfn_hi + 1 : 0;
> +}
> +
> +static inline unsigned long iova_compute_allocatable_size(struct iova *iova)
> +{
> +	return __compute_allocatable_size(prev_iova_high(iova), iova->pfn_lo);
> +}
> +
> +static inline unsigned long iova_get_allocatable_size(struct iova *iova)
> +{
> +	return iova->allocatable_size;
> +}
> +
> +RB_DECLARE_CALLBACKS_MAX(static, iova_gap_callbacks, struct iova, node,
> +			 unsigned long, max_allocatable_size,
> +			 iova_get_allocatable_size)
> +
> +static inline void iova_max_allocatable_size_update(struct iova *iova)
> +{
> +	iova_gap_callbacks_propagate(&iova->node, NULL);
> +}
> +
>   void
>   init_iova_domain(struct iova_domain *iovad, unsigned long granule,
>   	unsigned long start_pfn)
> @@ -63,8 +113,16 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,
>   	iovad->dma_32bit_pfn = 1UL << (32 - iova_shift(iovad));
>   	iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>   	iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
> +	iovad->anchor.prev_iova = NULL;
> +	iovad->anchor.allocatable_size =
> +				__compute_allocatable_size(0, IOVA_ANCHOR);
> +	iovad->anchor.max_allocatable_size  = iovad->anchor.allocatable_size;
> +
>   	rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
>   	rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
> +
> +	if (start_pfn)
> +		reserve_iova(iovad, 0, start_pfn - 1);
>   }
>   EXPORT_SYMBOL_GPL(init_iova_domain);
>   
> @@ -87,7 +145,8 @@ __cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
>   }
>   
>   static void
> -__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
> +__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free,
> +			      struct rb_node *next)
>   {
>   	struct iova *cached_iova;
>   
> @@ -95,51 +154,32 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
>   	if (free == cached_iova ||
>   	    (free->pfn_hi < iovad->dma_32bit_pfn &&
>   	     free->pfn_lo >= cached_iova->pfn_lo))
> -		iovad->cached32_node = rb_next(&free->node);
> +		iovad->cached32_node = next;
>   
>   	if (free->pfn_lo < iovad->dma_32bit_pfn)
>   		iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>   
>   	cached_iova = to_iova(iovad->cached_node);
>   	if (free->pfn_lo >= cached_iova->pfn_lo)
> -		iovad->cached_node = rb_next(&free->node);
> +		iovad->cached_node = next;
>   }
>   
> -static struct rb_node *iova_find_limit(struct iova_domain *iovad, unsigned long limit_pfn)
> +static struct rb_node *iova_find_limit(struct iova_domain *iovad,
> +				       unsigned long limit_pfn)
>   {
> -	struct rb_node *node, *next;
> -	/*
> -	 * Ideally what we'd like to judge here is whether limit_pfn is close
> -	 * enough to the highest-allocated IOVA that starting the allocation
> -	 * walk from the anchor node will be quicker than this initial work to
> -	 * find an exact starting point (especially if that ends up being the
> -	 * anchor node anyway). This is an incredibly crude approximation which
> -	 * only really helps the most likely case, but is at least trivially easy.
> -	 */
> -	if (limit_pfn > iovad->dma_32bit_pfn)
> -		return &iovad->anchor.node;
> -
> -	node = iovad->rbroot.rb_node;
> -	while (to_iova(node)->pfn_hi < limit_pfn)
> -		node = node->rb_right;
> -
> -search_left:
> -	while (node->rb_left && to_iova(node->rb_left)->pfn_lo >= limit_pfn)
> -		node = node->rb_left;
> -
> -	if (!node->rb_left)
> -		return node;
> -
> -	next = node->rb_left;
> -	while (next->rb_right) {
> -		next = next->rb_right;
> -		if (to_iova(next)->pfn_lo >= limit_pfn) {
> -			node = next;
> -			goto search_left;
> -		}
> -	}
> +	struct rb_node *curr = iovad->rbroot.rb_node;
>   
> -	return node;
> +	while (curr) {
> +		struct iova *iova = to_iova(curr);
> +
> +		if (limit_pfn - 1 > iova->pfn_hi)
> +			curr = curr->rb_right;
> +		else if (limit_pfn <= prev_iova_high(iova))
> +			curr = curr->rb_left;
> +		else
> +			break;
> +	}
> +	return curr;
>   }
>   
>   /* Insert the iova into domain rbtree by holding writer lock */
> @@ -148,6 +188,7 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
>   		   struct rb_node *start)
>   {
>   	struct rb_node **new, *parent = NULL;
> +	struct iova *next_iova;
>   
>   	new = (start) ? &start : &(root->rb_node);
>   	/* Figure out where to put new node */
> @@ -166,61 +207,143 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
>   		}
>   	}
>   	/* Add new node and rebalance tree. */
> +
>   	rb_link_node(&iova->node, parent, new);
> -	rb_insert_color(&iova->node, root);
> +
> +	next_iova = to_iova(rb_next(&iova->node));
> +	iova->prev_iova = next_iova->prev_iova;
> +	next_iova->prev_iova = iova;
> +
> +	iova->allocatable_size = iova_compute_allocatable_size(iova);
> +	next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
> +
> +	/*
> +	 * Do't swap the following two lines, because next_iova is the ancestor
> +	 * of iova and updating iova first is faster.
> +	 */
> +	iova_max_allocatable_size_update(iova);
> +	iova_max_allocatable_size_update(next_iova);
> +
> +	rb_insert_augmented(&iova->node, root, &iova_gap_callbacks);
> +}
> +
> +static inline bool check_interval(unsigned long lo, unsigned long hi,
> +				  unsigned long limit_pfn, unsigned long size,
> +				  unsigned long align_mask)
> +{
> +	hi = min(hi, limit_pfn);
> +	if (lo >= hi)
> +		return false;
> +	if (hi >= size && ((hi - size) & align_mask) >= lo)
> +		return true;
> +	return false;
>   }
>   
>   static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
>   		unsigned long size, unsigned long limit_pfn,
>   			struct iova *new, bool size_aligned)
>   {
> -	struct rb_node *curr, *prev;
> -	struct iova *curr_iova;
>   	unsigned long flags;
> -	unsigned long new_pfn, retry_pfn;
> +	struct rb_node *curr;
> +	struct rb_node *parent;
> +	struct iova *curr_iova;
>   	unsigned long align_mask = ~0UL;
> -	unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
> +	bool ignore = false;
>   
>   	if (size_aligned)
>   		align_mask <<= fls_long(size - 1);
>   
> -	/* Walk the tree backwards */
>   	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> +
>   	if (limit_pfn <= iovad->dma_32bit_pfn &&
>   			size >= iovad->max32_alloc_size)
>   		goto iova32_full;
>   
>   	curr = __get_cached_rbnode(iovad, limit_pfn);
>   	curr_iova = to_iova(curr);
> -	retry_pfn = curr_iova->pfn_hi + 1;
>   
> -retry:
> -	do {
> -		high_pfn = min(high_pfn, curr_iova->pfn_lo);
> -		new_pfn = (high_pfn - size) & align_mask;
> -		prev = curr;
> -		curr = rb_prev(curr);
> -		curr_iova = to_iova(curr);
> -	} while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
> -
> -	if (high_pfn < size || new_pfn < low_pfn) {
> -		if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
> -			high_pfn = limit_pfn;
> -			low_pfn = retry_pfn;
> -			curr = iova_find_limit(iovad, limit_pfn);
> -			curr_iova = to_iova(curr);
> -			goto retry;
> +	if (limit_pfn >= curr_iova->pfn_lo &&
> +	    curr_iova->allocatable_size >= size)
> +		goto found;
> +
> +	/* If limit_pfn > dma_32bit_pfn, this could be faster. */
> +	if (limit_pfn > iovad->dma_32bit_pfn) {
> +		curr_iova = to_iova(&iovad->anchor.node);
> +
> +		while (curr_iova) {
> +			if (check_interval(prev_iova_high(curr_iova),
> +					   curr_iova->pfn_lo, limit_pfn,
> +					   size, align_mask))
> +				goto found;
> +			curr_iova = curr_iova->prev_iova;
>   		}
>   		iovad->max32_alloc_size = size;
>   		goto iova32_full;
>   	}
>   
> +	curr = iova_find_limit(iovad, limit_pfn);
> +	curr_iova = to_iova(curr);
> +
> +	if (check_interval(prev_iova_high(curr_iova),
> +			   curr_iova->pfn_lo, limit_pfn,
> +			   size, align_mask))
> +		goto found;
> +
> +	while (true) {
> +		/* Check left subtree */
> +		if (!ignore && curr->rb_left) {
> +			curr_iova = to_iova(curr->rb_left);
> +			if (curr_iova->max_allocatable_size >= size)
> +				goto check_subtree;
> +		}
> +
> +		parent = rb_parent(curr);
> +		if (parent == NULL)
> +			break;
> +		/*
> +		 * If current node is the left child of it's parent,
> +		 * the parent node and the parent's right sub_tree should not
> +		 * to be checked because they exceed the limit_pfn.
> +		 */
> +		ignore = parent->rb_left == curr;
> +		curr = parent;
> +
> +		/* Check current node. */
> +		if (!ignore) {
> +			curr_iova = to_iova(curr);
> +			if (curr_iova->allocatable_size >= size)
> +				goto found;
> +		}
> +	}
> +	if (limit_pfn >= iovad->dma_32bit_pfn)
> +		iovad->max32_alloc_size = size;
> +	goto iova32_full;
> +
> +check_subtree:
> +	while (true) {
> +		if (curr_iova->allocatable_size >= size)
> +			goto found;
> +
> +		curr = &curr_iova->node;
> +		if (curr->rb_right &&
> +			to_iova(curr->rb_right)->max_allocatable_size >= size) {
> +			curr_iova = to_iova(curr->rb_right);
> +			continue;
> +		}
> +		WARN_ON(curr->rb_left == NULL);
> +		curr_iova = to_iova(curr->rb_left);
> +	}
> +
> +found:
>   	/* pfn_lo will point to size aligned address if size_aligned is set */
> -	new->pfn_lo = new_pfn;
> +	new->pfn_lo = (min(curr_iova->pfn_lo, limit_pfn) - size) & align_mask;
>   	new->pfn_hi = new->pfn_lo + size - 1;
>   
> -	/* If we have 'prev', it's a valid place to start the insertion. */
> -	iova_insert_rbtree(&iovad->rbroot, new, prev);
> +	/*
> +	 * If we have 'prev' or 'next',
> +	 * it's a valid place to start the insertion.
> +	 */
> +	iova_insert_rbtree(&iovad->rbroot, new, &curr_iova->node);
>   	__cached_rbnode_insert_update(iovad, new);
>   
>   	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> @@ -352,9 +475,18 @@ private_find_iova(struct iova_domain *iovad, unsigned long pfn)
>   
>   static void remove_iova(struct iova_domain *iovad, struct iova *iova)
>   {
> +	struct rb_node *next;
> +	struct iova *next_iova;
>   	assert_spin_locked(&iovad->iova_rbtree_lock);
> -	__cached_rbnode_delete_update(iovad, iova);
> -	rb_erase(&iova->node, &iovad->rbroot);
> +
> +	next = rb_next(&iova->node);
> +	__cached_rbnode_delete_update(iovad, iova, next);
> +
> +	next_iova = to_iova(next);
> +	next_iova->prev_iova = iova->prev_iova;
> +	next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
> +	iova_max_allocatable_size_update(next_iova);
> +	rb_erase_augmented(&iova->node, &iovad->rbroot, &iova_gap_callbacks);
>   }
>   
>   /**
> @@ -554,8 +686,11 @@ static void
>   __adjust_overlap_range(struct iova *iova,
>   	unsigned long *pfn_lo, unsigned long *pfn_hi)
>   {
> -	if (*pfn_lo < iova->pfn_lo)
> +	if (*pfn_lo < iova->pfn_lo) {
>   		iova->pfn_lo = *pfn_lo;
> +		iova->allocatable_size = iova_compute_allocatable_size(iova);
> +		iova_max_allocatable_size_update(iova);
> +	}
>   	if (*pfn_hi > iova->pfn_hi)
>   		*pfn_lo = iova->pfn_hi + 1;
>   }
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index 320a70e40233..feb8121f104d 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -11,7 +11,7 @@
>   
>   #include <linux/types.h>
>   #include <linux/kernel.h>
> -#include <linux/rbtree.h>
> +#include <linux/rbtree_augmented.h>
>   #include <linux/dma-mapping.h>
>   
>   /* iova structure */
> @@ -19,6 +19,9 @@ struct iova {
>   	struct rb_node	node;
>   	unsigned long	pfn_hi; /* Highest allocated pfn */
>   	unsigned long	pfn_lo; /* Lowest allocated pfn */
> +	struct iova	*prev_iova;
> +	unsigned long	allocatable_size;
> +	unsigned long	max_allocatable_size;
>   };
>   
>   

-- 
"firm, enduring, strong, and long-lived"


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented
  2022-08-24 12:30 ` Ethan Zhao
@ 2022-08-25  8:10   ` Peng Zhang
  2022-08-26  8:58     ` Ethan Zhao
  0 siblings, 1 reply; 10+ messages in thread
From: Peng Zhang @ 2022-08-25  8:10 UTC (permalink / raw)
  To: Ethan Zhao, joro, will; +Cc: iommu, linux-kernel, robin.murphy


Hi,

Here is a real example. The version of kernel is 5.4.56.
Occurs when a lot of iova are not released for a long time.

[Wed May 25 05:27:59 2022] watchdog: BUG: soft lockup - CPU#58 stuck for 
23s! [ksoftirqd/58:302]
[Wed May 25 05:27:59 2022] Call Trace:
[Wed May 25 05:27:59 2022]  alloc_iova+0xf2/0x140
[Wed May 25 05:27:59 2022]  alloc_iova_fast+0x56/0x251
[Wed May 25 05:27:59 2022]  dma_ops_alloc_iova.isra.27+0x4b/0x70
[Wed May 25 05:27:59 2022]  __map_single.isra.28+0x4a/0x1d0
[Wed May 25 05:27:59 2022]  mlx5e_sq_xmit+0x98d/0x12b0 [mlx5_core]
[Wed May 25 05:27:59 2022]  ? packet_rcv+0x43/0x460
[Wed May 25 05:27:59 2022]  ? dev_hard_start_xmit+0x90/0x1e0
[Wed May 25 05:27:59 2022]  ? sch_direct_xmit+0x111/0x320
[Wed May 25 05:27:59 2022]  ? __qdisc_run+0x143/0x540
[Wed May 25 05:27:59 2022]  ? __dev_queue_xmit+0x6c3/0x8e0
[Wed May 25 05:27:59 2022]  ? ip_finish_output2+0x2d5/0x580
[Wed May 25 05:27:59 2022]  ? __ip_finish_output+0xe9/0x1b0
[Wed May 25 05:27:59 2022]  ? ip_output+0x6c/0xe0
[Wed May 25 05:27:59 2022]  ? __ip_finish_output+0x1b0/0x1b0
[Wed May 25 05:27:59 2022]  ? __ip_queue_xmit+0x15d/0x420
[Wed May 25 05:27:59 2022]  ? __tcp_transmit_skb+0x405/0x600
[Wed May 25 05:27:59 2022]  ? tcp_delack_timer_handler+0xb7/0x1b0
[Wed May 25 05:27:59 2022]  ? tcp_delack_timer+0x8b/0xa0
[Wed May 25 05:27:59 2022]  ? tcp_delack_timer_handler+0x1b0/0x1b0
[Wed May 25 05:27:59 2022]  ? call_timer_fn+0x2b/0x120
[Wed May 25 05:27:59 2022]  ? run_timer_softirq+0x1a6/0x420
[Wed May 25 05:27:59 2022]  ? update_load_avg+0x7e/0x640
[Wed May 25 05:27:59 2022]  ? update_curr+0xe1/0x1d0
[Wed May 25 05:27:59 2022]  ? __switch_to+0x7a/0x3e0
[Wed May 25 05:27:59 2022]  ? __do_softirq+0xda/0x2da
[Wed May 25 05:27:59 2022]  ? sort_range+0x20/0x20
[Wed May 25 05:27:59 2022]  ? run_ksoftirqd+0x26/0x40
[Wed May 25 05:27:59 2022]  ? smpboot_thread_fn+0xb8/0x150
[Wed May 25 05:27:59 2022]  ? kthread+0x110/0x130
[Wed May 25 05:27:59 2022]  ? kthread_park+0x80/0x80
[Wed May 25 05:27:59 2022]  ? ret_from_fork+0x1f/0x30

I did some more tests.

The test is single threaded.
Granule is 4k, limit is 2^20.

When the 1/4 address space is occupied by iova,
Repeat the following two steps:

1. Randomly releases an iova.
2. Allocate an iova of size 1 within the allocation limit of 2^20.

Before improvement:
> Tracing 1 functions for "alloc_iova"... Hit Ctrl-C to end.
> ^C
>      nsecs               : count     distribution
>          0 -> 1          : 0        |                                        |
>          2 -> 3          : 0        |                                        |
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 0        |                                        |
>         16 -> 31         : 0        |                                        |
>         32 -> 63         : 0        |                                        |
>         64 -> 127        : 0        |                                        |
>        128 -> 255        : 0        |                                        |
>        256 -> 511        : 352      |                                        |
>        512 -> 1023       : 258078   |****************************************|
>       1024 -> 2047       : 3612     |                                        |
>       2048 -> 4095       : 426      |                                        |
>       4096 -> 8191       : 183      |                                        |
>       8192 -> 16383      : 6        |                                        |
>      16384 -> 32767      : 5        |                                        |
>      32768 -> 65535      : 9        |                                        |
>      65536 -> 131071     : 18       |                                        |
>     131072 -> 262143     : 28       |                                        |
>     262144 -> 524287     : 74       |                                        |
>     524288 -> 1048575    : 109      |                                        |
>    1048576 -> 2097151    : 170      |                                        |
>    2097152 -> 4194303    : 100      |                                        |
>    4194304 -> 8388607    : 1        |                                        |
> 
> avg = 3110 nsecs, total: 818614399 nsecs, count: 263171
> 
> Tracing 1 functions for "remove_iova"... Hit Ctrl-C to end.
> ^C
>      nsecs               : count     distribution
>          0 -> 1          : 0        |                                        |
>          2 -> 3          : 0        |                                        |
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 0        |                                        |
>         16 -> 31         : 0        |                                        |
>         32 -> 63         : 0        |                                        |
>         64 -> 127        : 0        |                                        |
>        128 -> 255        : 0        |                                        |
>        256 -> 511        : 250651   |****************************************|
>        512 -> 1023       : 12405    |*                                       |
>       1024 -> 2047       : 111      |                                        |
>       2048 -> 4095       : 1        |                                        |
> 
> avg = 433 nsecs, total: 114136319 nsecs, count: 263168

With improvement:
> Tracing 1 functions for "alloc_iova"... Hit Ctrl-C to end.
> ^C
>      nsecs               : count     distribution
>          0 -> 1          : 0        |                                        |
>          2 -> 3          : 0        |                                        |
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 0        |                                        |
>         16 -> 31         : 0        |                                        |
>         32 -> 63         : 0        |                                        |
>         64 -> 127        : 0        |                                        |
>        128 -> 255        : 0        |                                        |
>        256 -> 511        : 0        |                                        |
>        512 -> 1023       : 258975   |****************************************|
>       1024 -> 2047       : 3618     |                                        |
>       2048 -> 4095       : 497      |                                        |
>       4096 -> 8191       : 74       |                                        |
>       8192 -> 16383      : 4        |                                        |
>      16384 -> 32767      : 1        |                                        |
> 
> avg = 637 nsecs, total: 167854061 nsecs, count: 263169
> 
> Tracing 1 functions for "remove_iova"... Hit Ctrl-C to end.
> ^C
>      nsecs               : count     distribution
>          0 -> 1          : 0        |                                        |
>          2 -> 3          : 0        |                                        |
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 0        |                                        |
>         16 -> 31         : 0        |                                        |
>         32 -> 63         : 0        |                                        |
>         64 -> 127        : 0        |                                        |
>        128 -> 255        : 0        |                                        |
>        256 -> 511        : 221560   |****************************************|
>        512 -> 1023       : 41427    |*******                                 |
>       1024 -> 2047       : 179      |                                        |
>       2048 -> 4095       : 2        |                                        |
> 
> avg = 477 nsecs, total: 125540399 nsecs, count: 263168


> s/distbution/distribution ?
Sorry, it's a typo.

I don't have a test program for "alloc_iova_fast + free_iova_fast"
right now.

Thanks,

Peng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented
  2022-08-25  8:10   ` [External] " Peng Zhang
@ 2022-08-26  8:58     ` Ethan Zhao
  2022-08-26 10:28       ` Peng Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Ethan Zhao @ 2022-08-26  8:58 UTC (permalink / raw)
  To: Peng Zhang, joro, will; +Cc: iommu, linux-kernel, robin.murphy

Peng,

在 2022/8/25 16:10, Peng Zhang 写道:
>
> Hi,
>
> Here is a real example. The version of kernel is 5.4.56.
> Occurs when a lot of iova are not released for a long time.
>
> [Wed May 25 05:27:59 2022] watchdog: BUG: soft lockup - CPU#58 stuck 
> for 23s! [ksoftirqd/58:302]
> [Wed May 25 05:27:59 2022] Call Trace:
> [Wed May 25 05:27:59 2022]  alloc_iova+0xf2/0x140
> [Wed May 25 05:27:59 2022]  alloc_iova_fast+0x56/0x251
The rcache doesn't work at all , the worst case.
> [Wed May 25 05:27:59 2022]  dma_ops_alloc_iova.isra.27+0x4b/0x70
> [Wed May 25 05:27:59 2022]  __map_single.isra.28+0x4a/0x1d0
> [Wed May 25 05:27:59 2022]  mlx5e_sq_xmit+0x98d/0x12b0 [mlx5_core]
> [Wed May 25 05:27:59 2022]  ? packet_rcv+0x43/0x460
> [Wed May 25 05:27:59 2022]  ? dev_hard_start_xmit+0x90/0x1e0
> [Wed May 25 05:27:59 2022]  ? sch_direct_xmit+0x111/0x320
> [Wed May 25 05:27:59 2022]  ? __qdisc_run+0x143/0x540
> [Wed May 25 05:27:59 2022]  ? __dev_queue_xmit+0x6c3/0x8e0
> [Wed May 25 05:27:59 2022]  ? ip_finish_output2+0x2d5/0x580
> [Wed May 25 05:27:59 2022]  ? __ip_finish_output+0xe9/0x1b0
> [Wed May 25 05:27:59 2022]  ? ip_output+0x6c/0xe0
> [Wed May 25 05:27:59 2022]  ? __ip_finish_output+0x1b0/0x1b0
> [Wed May 25 05:27:59 2022]  ? __ip_queue_xmit+0x15d/0x420
> [Wed May 25 05:27:59 2022]  ? __tcp_transmit_skb+0x405/0x600
> [Wed May 25 05:27:59 2022]  ? tcp_delack_timer_handler+0xb7/0x1b0
> [Wed May 25 05:27:59 2022]  ? tcp_delack_timer+0x8b/0xa0
> [Wed May 25 05:27:59 2022]  ? tcp_delack_timer_handler+0x1b0/0x1b0
> [Wed May 25 05:27:59 2022]  ? call_timer_fn+0x2b/0x120
> [Wed May 25 05:27:59 2022]  ? run_timer_softirq+0x1a6/0x420
> [Wed May 25 05:27:59 2022]  ? update_load_avg+0x7e/0x640
> [Wed May 25 05:27:59 2022]  ? update_curr+0xe1/0x1d0
> [Wed May 25 05:27:59 2022]  ? __switch_to+0x7a/0x3e0
> [Wed May 25 05:27:59 2022]  ? __do_softirq+0xda/0x2da
> [Wed May 25 05:27:59 2022]  ? sort_range+0x20/0x20
> [Wed May 25 05:27:59 2022]  ? run_ksoftirqd+0x26/0x40
> [Wed May 25 05:27:59 2022]  ? smpboot_thread_fn+0xb8/0x150
> [Wed May 25 05:27:59 2022]  ? kthread+0x110/0x130
> [Wed May 25 05:27:59 2022]  ? kthread_park+0x80/0x80
> [Wed May 25 05:27:59 2022]  ? ret_from_fork+0x1f/0x30
>
> I did some more tests.
>
> The test is single threaded.
> Granule is 4k, limit is 2^20.
>
> When the 1/4 address space is occupied by iova,
> Repeat the following two steps:
>
> 1. Randomly releases an iova.
> 2. Allocate an iova of size 1 within the allocation limit of 2^20.
>
> Before improvement:
>> Tracing 1 functions for "alloc_iova"... Hit Ctrl-C to end.
>> ^C
>>      nsecs               : count     distribution
>>          0 -> 1          : 0 |                                        |
>>          2 -> 3          : 0 |                                        |
>>          4 -> 7          : 0 |                                        |
>>          8 -> 15         : 0 |                                        |
>>         16 -> 31         : 0 |                                        |
>>         32 -> 63         : 0 |                                        |
>>         64 -> 127        : 0 |                                        |
>>        128 -> 255        : 0 |                                        |
>>        256 -> 511        : 352 
>> |                                        |
>>        512 -> 1023       : 258078 
>> |****************************************|
>>       1024 -> 2047       : 3612 
>> |                                        |
>>       2048 -> 4095       : 426 
>> |                                        |
>>       4096 -> 8191       : 183 
>> |                                        |
>>       8192 -> 16383      : 6 |                                        |
>>      16384 -> 32767      : 5 |                                        |
>>      32768 -> 65535      : 9 |                                        |
>>      65536 -> 131071     : 18 |                                        |
>>     131072 -> 262143     : 28 |                                        |
>>     262144 -> 524287     : 74 |                                        |
>>     524288 -> 1048575    : 109 
>> |                                        |
>>    1048576 -> 2097151    : 170 
>> |                                        |
>>    2097152 -> 4194303    : 100 
>> |                                        |
>>    4194304 -> 8388607    : 1 |                                        |
>>
>> avg = 3110 nsecs, total: 818614399 nsecs, count: 263171
>>
>> Tracing 1 functions for "remove_iova"... Hit Ctrl-C to end.
>> ^C
>>      nsecs               : count     distribution
>>          0 -> 1          : 0 |                                        |
>>          2 -> 3          : 0 |                                        |
>>          4 -> 7          : 0 |                                        |
>>          8 -> 15         : 0 |                                        |
>>         16 -> 31         : 0 |                                        |
>>         32 -> 63         : 0 |                                        |
>>         64 -> 127        : 0 |                                        |
>>        128 -> 255        : 0 |                                        |
>>        256 -> 511        : 250651 
>> |****************************************|
>>        512 -> 1023       : 12405 
>> |*                                       |
>>       1024 -> 2047       : 111 
>> |                                        |
>>       2048 -> 4095       : 1 |                                        |
>>
>> avg = 433 nsecs, total: 114136319 nsecs, count: 263168
>
> With improvement:
>> Tracing 1 functions for "alloc_iova"... Hit Ctrl-C to end.
>> ^C
>>      nsecs               : count     distribution
>>          0 -> 1          : 0 |                                        |
>>          2 -> 3          : 0 |                                        |
>>          4 -> 7          : 0 |                                        |
>>          8 -> 15         : 0 |                                        |
>>         16 -> 31         : 0 |                                        |
>>         32 -> 63         : 0 |                                        |
>>         64 -> 127        : 0 |                                        |
>>        128 -> 255        : 0 |                                        |
>>        256 -> 511        : 0 |                                        |
>>        512 -> 1023       : 258975 
>> |****************************************|
>>       1024 -> 2047       : 3618 
>> |                                        |
>>       2048 -> 4095       : 497 
>> |                                        |
>>       4096 -> 8191       : 74 |                                        |
>>       8192 -> 16383      : 4 |                                        |
>>      16384 -> 32767      : 1 |                                        |
>>
>> avg = 637 nsecs, total: 167854061 nsecs, count: 263169
>>
>> Tracing 1 functions for "remove_iova"... Hit Ctrl-C to end.
>> ^C
>>      nsecs               : count     distribution
>>          0 -> 1          : 0 |                                        |
>>          2 -> 3          : 0 |                                        |
>>          4 -> 7          : 0 |                                        |
>>          8 -> 15         : 0 |                                        |
>>         16 -> 31         : 0 |                                        |
>>         32 -> 63         : 0 |                                        |
>>         64 -> 127        : 0 |                                        |
>>        128 -> 255        : 0 |                                        |
>>        256 -> 511        : 221560 
>> |****************************************|
>>        512 -> 1023       : 41427 
>> |*******                                 |
>>       1024 -> 2047       : 179 
>> |                                        |
>>       2048 -> 4095       : 2 |                                        |
>>
>> avg = 477 nsecs, total: 125540399 nsecs, count: 263168

Though only 3-4 drivers use alloc_iova() directly, in my understanding

your test has simulated the worst case, rcache doesn't work at all,

"alloc_iova" +“remove_iova” number looks great for worst case.


Reviewed-by: Ethan Zhao <haifeng.zhao@linux.intel.com>



>
>> s/distbution/distribution ?
> Sorry, it's a typo.
>
> I don't have a test program for "alloc_iova_fast + free_iova_fast"
> right now.
>
> Thanks,
>
> Peng

-- 
"firm, enduring, strong, and long-lived"


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented
  2022-08-26  8:58     ` Ethan Zhao
@ 2022-08-26 10:28       ` Peng Zhang
  2022-09-01 10:45         ` John Garry
  0 siblings, 1 reply; 10+ messages in thread
From: Peng Zhang @ 2022-08-26 10:28 UTC (permalink / raw)
  To: Ethan Zhao, joro, will; +Cc: iommu, linux-kernel, robin.murphy


> Though only 3-4 drivers use alloc_iova() directly, in my understanding
> 
> your test has simulated the worst case, rcache doesn't work at all,
> 
> "alloc_iova" +“remove_iova” number looks great for worst case.

There is another case, when the size to allocate greater to 2^5, even if 
alloc_iova_fast() is used, alloc_iova() will always be called because 
the maximum iova size that rcache supports to allocate is 32.
IOVA_RANGE_CACHE_MAX_SIZE specifies the maximum size.

Thanks,

Peng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented
  2022-08-26 10:28       ` Peng Zhang
@ 2022-09-01 10:45         ` John Garry
  2022-09-02  3:30           ` Peng Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: John Garry @ 2022-09-01 10:45 UTC (permalink / raw)
  To: Peng Zhang, Ethan Zhao, joro, will; +Cc: iommu, linux-kernel, robin.murphy

On 26/08/2022 11:28, Peng Zhang wrote:
> 
>> Though only 3-4 drivers use alloc_iova() directly, in my understanding
>>
>> your test has simulated the worst case, rcache doesn't work at all,
>>
>> "alloc_iova" +“remove_iova” number looks great for worst case.
> 
> There is another case, when the size to allocate greater to 2^5, even if 
> alloc_iova_fast() is used, alloc_iova() will always be called because 
> the maximum iova size that rcache supports to allocate is 32.
> IOVA_RANGE_CACHE_MAX_SIZE specifies the maximum size.
> 


If you really have a performance issue with alloc_iova_fast() -> 
alloc_iova() then I suggest that you consider trying to use 
dma_opt_mapping_size() to teach the DMA engine driver to not create 
requests whose overall size exceeds to the rcache limit.

Thanks,
John


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented
  2022-09-01 10:45         ` John Garry
@ 2022-09-02  3:30           ` Peng Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Peng Zhang @ 2022-09-02  3:30 UTC (permalink / raw)
  To: John Garry, joro, will
  Cc: iommu, linux-kernel, robin.murphy, xieyongji, Ethan Zhao


> If you really have a performance issue with alloc_iova_fast() -> 
> alloc_iova() then I suggest that you consider trying to use 
> dma_opt_mapping_size() to teach the DMA engine driver to not create 
> requests whose overall size exceeds to the rcache limit.

Yes. But I don't think it essentially solves the problem.
A library for users should run stably ant it shouldn't hold
the spinlock for a long time in some cases. It can even be
said to be a bug.

Like this:
[Wed May 25 05:27:59 2022] watchdog: BUG: soft lockup - CPU#58
stuck for 23s!
[Wed May 25 05:27:59 2022] Call Trace:
[Wed May 25 05:27:59 2022]  alloc_iova+0xf2/0x140
[Wed May 25 05:27:59 2022]  alloc_iova_fast+0x56/0x251

Now we avoid this problem with iommu=pt, but it didn't solve the problem.

Thanks,
Peng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented
  2022-08-24  9:51 [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented Peng Zhang
  2022-08-24 12:30 ` Ethan Zhao
@ 2022-09-21  1:15 ` wangjie (L)
  2022-09-21  3:55   ` Peng Zhang
  1 sibling, 1 reply; 10+ messages in thread
From: wangjie (L) @ 2022-09-21  1:15 UTC (permalink / raw)
  To: Peng Zhang, joro, will; +Cc: iommu, linux-kernel, robin.murphy

This patch seems to solve the performance issues i have.  Currently my 
nic's rx performance is unstable in large-capacity scenarios. I applied 
this patch to 5.19 rc4 and tested 8 times rx performance in these 
scenes. Here are test results, "before" row is the result of 5.19 rc4. 
"after" row means 5.19 rc4 with this patch, the unit is Mbits/s

	1	2	3	4	5	6	7	8
before	55430	76701	84194	77560	88292	90106	87770	77273	
after	92770	92767	92792	92764	92742	92696	92781	92756

Obviously, after using this patch, the performance is stable.

On 2022/8/24 17:51, Peng Zhang wrote:
> The current algorithm of alloc_iova is to scan all iovas until it finds
> a gap that satisfies the condition to allocate. This can be very slow in
> some scenarios. We can optimize alloc_iova() from time complexity O(n)
> to O(log(n)).
>
> We can make a test like this:
> Write a module and initialize iova_domain with 4k granule.
> Then using a user-mode program to call the module to allocate iova of
> size 1 2^20 times within the allocation limit of 2^20. This is single
> threaded and the low 4g space is full after 2^20 allocations.
>
> Finally loop the following three steps:
> 1. Randomly releases an iova.
>
> 2. Allocate an iova of size 1 within the allocation limit of 2^20.
>
> 3. Allocate an iova of size 1 within the allocation limit of 2^20.
>    This will fail and take a very long time, because max32_alloc_size
>    is reset whenever an iova is released.
>
> The data below is the result of repeating the three steps 1024 times in
> a physical machine with a CPU clocked at 2.30GHz
>
> Before improvement:
> Tracing 1 functions for "alloc_iova"...
>    nsecs             : count    distbution
>      256 -> 511      : 1594    |                                      |
>      512 -> 1023     : 1030686 |**************************************|
>     1024 -> 2047     : 14661   |                                      |
>     2048 -> 4095     : 1730    |                                      |
>     4096 -> 8191     : 634     |                                      |
>     8192 -> 16383    : 20      |                                      |
>    16384 -> 32767    : 2       |                                      |
>    32768 -> 65535    : 2       |                                      |
>    65536 -> 131071   : 3       |                                      |
>   131072 -> 262143   : 6       |                                      |
>   262144 -> 524287   : 8       |                                      |
>   524288 -> 1048575  : 19      |                                      |
>  1048576 -> 2097151  : 35      |                                      |
>  2097152 -> 4194303  : 55      |                                      |
>  4194304 -> 8388607  : 117     |                                      |
>  8388608 -> 16777215 : 165     |                                      |
> 16777216 -> 33554431 : 1112    |                                      |
> avg = 33867 nsecs, total: 35589643563 nsecs, count: 1050849
>
> With improvement:
> Tracing 1 functions for "alloc_iova"...
> nsecs             : count     distribution
>   512 -> 1023     : 1033561  |****************************************|
>  1024 -> 2047     : 13631    |                                        |
>  2048 -> 4095     : 2981     |                                        |
>  4096 -> 8191     : 448      |                                        |
>  8192 -> 16383    : 5        |                                        |
> 16384 -> 32767    : 1        |                                        |
> avg = 696 nsecs, total: 732196323 nsecs, count: 1050627
>
> Introduce the improved algorithm:
>
> ------------------------------------------------------------------------
> | gap1  |iova1| gap2 |iova2| gap3 |iova3|   gap4  |iova4| gap5  |anchor|
> ------------------------------------------------------------------------
>
> let A = allocatable_size
> let B = max_allocatable_size
>                     ____________
>                   /    iova2     \      B = max( left_child->B,
>                  |       A        |              right_child->B,
>                   \      B       /               A)
>                     ------------
>                    /            \
>                   /              \
>     ____________                    ____________
>   /    iova1     \                /    iova4     \
>  |       A        |              |       A        |
>   \      B       /                \      B        /
>     ------------                    ------------
>                                    /            \
>                                   /              \
>                     ____________                    ____________
>                   /    iova3     \                /    anchor    \
>                  |       A        |              |       A        |
>                   \      B       /                \      B        /
>                     ------------                    ------------
>
> Define the gap of a iova is the gap between the iova and it's previous
> iova. Such as the gap of iova3 is gap3.This gap can be used to allocate.
>
> Add three variables to struct iova.
> prev_iova:
>          point to the previous iova, sush as iova3->prev_iova point to
>          iova2.
>
> allocatable_size:
>          allocatable_size is the max size can be allocated from a gap.
>          It is not the length of a gap because the allocated address
>          may need to be aligned.
>
> max_allocatable_size:
>          max_allocatable_size is the max allocatable_size of all iova's
>          gap in the subtree.
>
>          max_allocatable_size = max( left_child->max_allocatable_size,
>                                      right_child->max_allocatable_size,
>                                      allocatable_size)
>
> We can use rbtree_augmented to maintain max_allocatable_size in time
> complexity O(log(n)).
>
> In the rbtree, with the max_allocatable_size and allocatable_size,
> searching the gap to allocate is fast and the time complexity is
> O(log(n)).
>
> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
> ---
>  drivers/iommu/iova.c | 265 ++++++++++++++++++++++++++++++++-----------
>  include/linux/iova.h |   5 +-
>  2 files changed, 204 insertions(+), 66 deletions(-)
>
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index db77aa675145..79625ac82560 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -43,6 +43,56 @@ static struct iova *to_iova(struct rb_node *node)
>  	return rb_entry(node, struct iova, node);
>  }
>
> +/*
> + * We can't judge whether it can be allocated only by a given interval length
> + * because the address may be aligned.
> + * This function computes the max allocatable size for a given interval.
> + * The time complexity of this function is O(log(n)).
> + */
> +static unsigned long __compute_allocatable_size(unsigned long lo,
> +						unsigned long hi)
> +{
> +	unsigned long allocatable_size = 0;
> +
> +	if (lo == 0)
> +		return hi;
> +	while (lo < hi) {
> +		unsigned long delta = 1UL << __ffs64(lo);
> +
> +		if (hi - lo <= delta) {
> +			allocatable_size = max(allocatable_size, hi - lo);
> +			break;
> +		}
> +		allocatable_size = max(allocatable_size, delta);
> +		lo += delta;
> +	}
> +	return allocatable_size;
> +}
> +
> +static inline unsigned long prev_iova_high(struct iova *iova)
> +{
> +	return iova->prev_iova ? iova->prev_iova->pfn_hi + 1 : 0;
> +}
> +
> +static inline unsigned long iova_compute_allocatable_size(struct iova *iova)
> +{
> +	return __compute_allocatable_size(prev_iova_high(iova), iova->pfn_lo);
> +}
> +
> +static inline unsigned long iova_get_allocatable_size(struct iova *iova)
> +{
> +	return iova->allocatable_size;
> +}
> +
> +RB_DECLARE_CALLBACKS_MAX(static, iova_gap_callbacks, struct iova, node,
> +			 unsigned long, max_allocatable_size,
> +			 iova_get_allocatable_size)
> +
> +static inline void iova_max_allocatable_size_update(struct iova *iova)
> +{
> +	iova_gap_callbacks_propagate(&iova->node, NULL);
> +}
> +
>  void
>  init_iova_domain(struct iova_domain *iovad, unsigned long granule,
>  	unsigned long start_pfn)
> @@ -63,8 +113,16 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,
>  	iovad->dma_32bit_pfn = 1UL << (32 - iova_shift(iovad));
>  	iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>  	iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
> +	iovad->anchor.prev_iova = NULL;
> +	iovad->anchor.allocatable_size =
> +				__compute_allocatable_size(0, IOVA_ANCHOR);
> +	iovad->anchor.max_allocatable_size  = iovad->anchor.allocatable_size;
> +
>  	rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
>  	rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
> +
> +	if (start_pfn)
> +		reserve_iova(iovad, 0, start_pfn - 1);
>  }
>  EXPORT_SYMBOL_GPL(init_iova_domain);
>
> @@ -87,7 +145,8 @@ __cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
>  }
>
>  static void
> -__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
> +__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free,
> +			      struct rb_node *next)
>  {
>  	struct iova *cached_iova;
>
> @@ -95,51 +154,32 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
>  	if (free == cached_iova ||
>  	    (free->pfn_hi < iovad->dma_32bit_pfn &&
>  	     free->pfn_lo >= cached_iova->pfn_lo))
> -		iovad->cached32_node = rb_next(&free->node);
> +		iovad->cached32_node = next;
>
>  	if (free->pfn_lo < iovad->dma_32bit_pfn)
>  		iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>
>  	cached_iova = to_iova(iovad->cached_node);
>  	if (free->pfn_lo >= cached_iova->pfn_lo)
> -		iovad->cached_node = rb_next(&free->node);
> +		iovad->cached_node = next;
>  }
>
> -static struct rb_node *iova_find_limit(struct iova_domain *iovad, unsigned long limit_pfn)
> +static struct rb_node *iova_find_limit(struct iova_domain *iovad,
> +				       unsigned long limit_pfn)
>  {
> -	struct rb_node *node, *next;
> -	/*
> -	 * Ideally what we'd like to judge here is whether limit_pfn is close
> -	 * enough to the highest-allocated IOVA that starting the allocation
> -	 * walk from the anchor node will be quicker than this initial work to
> -	 * find an exact starting point (especially if that ends up being the
> -	 * anchor node anyway). This is an incredibly crude approximation which
> -	 * only really helps the most likely case, but is at least trivially easy.
> -	 */
> -	if (limit_pfn > iovad->dma_32bit_pfn)
> -		return &iovad->anchor.node;
> -
> -	node = iovad->rbroot.rb_node;
> -	while (to_iova(node)->pfn_hi < limit_pfn)
> -		node = node->rb_right;
> -
> -search_left:
> -	while (node->rb_left && to_iova(node->rb_left)->pfn_lo >= limit_pfn)
> -		node = node->rb_left;
> -
> -	if (!node->rb_left)
> -		return node;
> -
> -	next = node->rb_left;
> -	while (next->rb_right) {
> -		next = next->rb_right;
> -		if (to_iova(next)->pfn_lo >= limit_pfn) {
> -			node = next;
> -			goto search_left;
> -		}
> -	}
> +	struct rb_node *curr = iovad->rbroot.rb_node;
>
> -	return node;
> +	while (curr) {
> +		struct iova *iova = to_iova(curr);
> +
> +		if (limit_pfn - 1 > iova->pfn_hi)
> +			curr = curr->rb_right;
> +		else if (limit_pfn <= prev_iova_high(iova))
> +			curr = curr->rb_left;
> +		else
> +			break;
> +	}
> +	return curr;
>  }
>
>  /* Insert the iova into domain rbtree by holding writer lock */
> @@ -148,6 +188,7 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
>  		   struct rb_node *start)
>  {
>  	struct rb_node **new, *parent = NULL;
> +	struct iova *next_iova;
>
>  	new = (start) ? &start : &(root->rb_node);
>  	/* Figure out where to put new node */
> @@ -166,61 +207,143 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
>  		}
>  	}
>  	/* Add new node and rebalance tree. */
> +
>  	rb_link_node(&iova->node, parent, new);
> -	rb_insert_color(&iova->node, root);
> +
> +	next_iova = to_iova(rb_next(&iova->node));
> +	iova->prev_iova = next_iova->prev_iova;
> +	next_iova->prev_iova = iova;
> +
> +	iova->allocatable_size = iova_compute_allocatable_size(iova);
> +	next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
> +
> +	/*
> +	 * Do't swap the following two lines, because next_iova is the ancestor
> +	 * of iova and updating iova first is faster.
> +	 */
> +	iova_max_allocatable_size_update(iova);
> +	iova_max_allocatable_size_update(next_iova);
> +
> +	rb_insert_augmented(&iova->node, root, &iova_gap_callbacks);
> +}
> +
> +static inline bool check_interval(unsigned long lo, unsigned long hi,
> +				  unsigned long limit_pfn, unsigned long size,
> +				  unsigned long align_mask)
> +{
> +	hi = min(hi, limit_pfn);
> +	if (lo >= hi)
> +		return false;
> +	if (hi >= size && ((hi - size) & align_mask) >= lo)
> +		return true;
> +	return false;
>  }
>
>  static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
>  		unsigned long size, unsigned long limit_pfn,
>  			struct iova *new, bool size_aligned)
>  {
> -	struct rb_node *curr, *prev;
> -	struct iova *curr_iova;
>  	unsigned long flags;
> -	unsigned long new_pfn, retry_pfn;
> +	struct rb_node *curr;
> +	struct rb_node *parent;
> +	struct iova *curr_iova;
>  	unsigned long align_mask = ~0UL;
> -	unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
> +	bool ignore = false;
>
>  	if (size_aligned)
>  		align_mask <<= fls_long(size - 1);
>
> -	/* Walk the tree backwards */
>  	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> +
>  	if (limit_pfn <= iovad->dma_32bit_pfn &&
>  			size >= iovad->max32_alloc_size)
>  		goto iova32_full;
>
>  	curr = __get_cached_rbnode(iovad, limit_pfn);
>  	curr_iova = to_iova(curr);
> -	retry_pfn = curr_iova->pfn_hi + 1;
>
> -retry:
> -	do {
> -		high_pfn = min(high_pfn, curr_iova->pfn_lo);
> -		new_pfn = (high_pfn - size) & align_mask;
> -		prev = curr;
> -		curr = rb_prev(curr);
> -		curr_iova = to_iova(curr);
> -	} while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
> -
> -	if (high_pfn < size || new_pfn < low_pfn) {
> -		if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
> -			high_pfn = limit_pfn;
> -			low_pfn = retry_pfn;
> -			curr = iova_find_limit(iovad, limit_pfn);
> -			curr_iova = to_iova(curr);
> -			goto retry;
> +	if (limit_pfn >= curr_iova->pfn_lo &&
> +	    curr_iova->allocatable_size >= size)
> +		goto found;
> +
> +	/* If limit_pfn > dma_32bit_pfn, this could be faster. */
> +	if (limit_pfn > iovad->dma_32bit_pfn) {
> +		curr_iova = to_iova(&iovad->anchor.node);
> +
> +		while (curr_iova) {
> +			if (check_interval(prev_iova_high(curr_iova),
> +					   curr_iova->pfn_lo, limit_pfn,
> +					   size, align_mask))
> +				goto found;
> +			curr_iova = curr_iova->prev_iova;
>  		}
>  		iovad->max32_alloc_size = size;
>  		goto iova32_full;
>  	}
>
> +	curr = iova_find_limit(iovad, limit_pfn);
> +	curr_iova = to_iova(curr);
> +
> +	if (check_interval(prev_iova_high(curr_iova),
> +			   curr_iova->pfn_lo, limit_pfn,
> +			   size, align_mask))
> +		goto found;
> +
> +	while (true) {
> +		/* Check left subtree */
> +		if (!ignore && curr->rb_left) {
> +			curr_iova = to_iova(curr->rb_left);
> +			if (curr_iova->max_allocatable_size >= size)
> +				goto check_subtree;
> +		}
> +
> +		parent = rb_parent(curr);
> +		if (parent == NULL)
> +			break;
> +		/*
> +		 * If current node is the left child of it's parent,
> +		 * the parent node and the parent's right sub_tree should not
> +		 * to be checked because they exceed the limit_pfn.
> +		 */
> +		ignore = parent->rb_left == curr;
> +		curr = parent;
> +
> +		/* Check current node. */
> +		if (!ignore) {
> +			curr_iova = to_iova(curr);
> +			if (curr_iova->allocatable_size >= size)
> +				goto found;
> +		}
> +	}
> +	if (limit_pfn >= iovad->dma_32bit_pfn)
> +		iovad->max32_alloc_size = size;
> +	goto iova32_full;
> +
> +check_subtree:
> +	while (true) {
> +		if (curr_iova->allocatable_size >= size)
> +			goto found;
> +
> +		curr = &curr_iova->node;
> +		if (curr->rb_right &&
> +			to_iova(curr->rb_right)->max_allocatable_size >= size) {
> +			curr_iova = to_iova(curr->rb_right);
> +			continue;
> +		}
> +		WARN_ON(curr->rb_left == NULL);
> +		curr_iova = to_iova(curr->rb_left);
> +	}
> +
> +found:
>  	/* pfn_lo will point to size aligned address if size_aligned is set */
> -	new->pfn_lo = new_pfn;
> +	new->pfn_lo = (min(curr_iova->pfn_lo, limit_pfn) - size) & align_mask;
>  	new->pfn_hi = new->pfn_lo + size - 1;
>
> -	/* If we have 'prev', it's a valid place to start the insertion. */
> -	iova_insert_rbtree(&iovad->rbroot, new, prev);
> +	/*
> +	 * If we have 'prev' or 'next',
> +	 * it's a valid place to start the insertion.
> +	 */
> +	iova_insert_rbtree(&iovad->rbroot, new, &curr_iova->node);
>  	__cached_rbnode_insert_update(iovad, new);
>
>  	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> @@ -352,9 +475,18 @@ private_find_iova(struct iova_domain *iovad, unsigned long pfn)
>
>  static void remove_iova(struct iova_domain *iovad, struct iova *iova)
>  {
> +	struct rb_node *next;
> +	struct iova *next_iova;
>  	assert_spin_locked(&iovad->iova_rbtree_lock);
> -	__cached_rbnode_delete_update(iovad, iova);
> -	rb_erase(&iova->node, &iovad->rbroot);
> +
> +	next = rb_next(&iova->node);
> +	__cached_rbnode_delete_update(iovad, iova, next);
> +
> +	next_iova = to_iova(next);
> +	next_iova->prev_iova = iova->prev_iova;
> +	next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
> +	iova_max_allocatable_size_update(next_iova);
> +	rb_erase_augmented(&iova->node, &iovad->rbroot, &iova_gap_callbacks);
>  }
>
>  /**
> @@ -554,8 +686,11 @@ static void
>  __adjust_overlap_range(struct iova *iova,
>  	unsigned long *pfn_lo, unsigned long *pfn_hi)
>  {
> -	if (*pfn_lo < iova->pfn_lo)
> +	if (*pfn_lo < iova->pfn_lo) {
>  		iova->pfn_lo = *pfn_lo;
> +		iova->allocatable_size = iova_compute_allocatable_size(iova);
> +		iova_max_allocatable_size_update(iova);
> +	}
>  	if (*pfn_hi > iova->pfn_hi)
>  		*pfn_lo = iova->pfn_hi + 1;
>  }
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index 320a70e40233..feb8121f104d 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -11,7 +11,7 @@
>
>  #include <linux/types.h>
>  #include <linux/kernel.h>
> -#include <linux/rbtree.h>
> +#include <linux/rbtree_augmented.h>
>  #include <linux/dma-mapping.h>
>
>  /* iova structure */
> @@ -19,6 +19,9 @@ struct iova {
>  	struct rb_node	node;
>  	unsigned long	pfn_hi; /* Highest allocated pfn */
>  	unsigned long	pfn_lo; /* Lowest allocated pfn */
> +	struct iova	*prev_iova;
> +	unsigned long	allocatable_size;
> +	unsigned long	max_allocatable_size;
>  };
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented
  2022-09-21  1:15 ` wangjie (L)
@ 2022-09-21  3:55   ` Peng Zhang
  2022-09-21 10:54     ` wangjie (L)
  0 siblings, 1 reply; 10+ messages in thread
From: Peng Zhang @ 2022-09-21  3:55 UTC (permalink / raw)
  To: wangjie (L), joro, will
  Cc: iommu, linux-kernel, robin.murphy, haifeng.zhao, john.garry

> This patch seems to solve the performance issues i have.  Currently my 
> nic's rx performance is unstable in large-capacity scenarios. I applied 
> this patch to 5.19 rc4 and tested 8 times rx performance in these 
> scenes. Here are test results, "before" row is the result of 5.19 rc4. 
> "after" row means 5.19 rc4 with this patch, the unit is Mbits/s
> 
>      1    2    3    4    5    6    7    8
> before    55430    76701    84194    77560    88292    90106    87770    
> 77273
> after    92770    92767    92792    92764    92742    92696    92781    
> 92756
> 
> Obviously, after using this patch, the performance is stable.

Thank you for your test. Can you add a "Tested-by" token?
I plan to update the commit message in the next version.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented
  2022-09-21  3:55   ` Peng Zhang
@ 2022-09-21 10:54     ` wangjie (L)
  0 siblings, 0 replies; 10+ messages in thread
From: wangjie (L) @ 2022-09-21 10:54 UTC (permalink / raw)
  To: Peng Zhang, joro, will
  Cc: iommu, linux-kernel, robin.murphy, haifeng.zhao, john.garry

Tested-by: Jie Wang <wangjie125@huawei.com>

On 2022/9/21 11:55, Peng Zhang wrote:
>> This patch seems to solve the performance issues i have.  Currently my
>> nic's rx performance is unstable in large-capacity scenarios. I
>> applied this patch to 5.19 rc4 and tested 8 times rx performance in
>> these scenes. Here are test results, "before" row is the result of
>> 5.19 rc4. "after" row means 5.19 rc4 with this patch, the unit is Mbits/s
>>
>>      1    2    3    4    5    6    7    8
>> before    55430    76701    84194    77560    88292    90106
>> 87770    77273
>> after    92770    92767    92792    92764    92742    92696
>> 92781    92756
>>
>> Obviously, after using this patch, the performance is stable.
>
> Thank you for your test. Can you add a "Tested-by" token?
> I plan to update the commit message in the next version.
> .
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-09-21 10:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-24  9:51 [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented Peng Zhang
2022-08-24 12:30 ` Ethan Zhao
2022-08-25  8:10   ` [External] " Peng Zhang
2022-08-26  8:58     ` Ethan Zhao
2022-08-26 10:28       ` Peng Zhang
2022-09-01 10:45         ` John Garry
2022-09-02  3:30           ` Peng Zhang
2022-09-21  1:15 ` wangjie (L)
2022-09-21  3:55   ` Peng Zhang
2022-09-21 10:54     ` wangjie (L)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).