linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Some fixes and cleanup for maple tree.
@ 2023-03-10 14:08 Peng Zhang
  2023-03-10 14:08 ` [PATCH 1/4] maple_tree: Fix get wrong data_end in mtree_lookup_walk() Peng Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Peng Zhang @ 2023-03-10 14:08 UTC (permalink / raw)
  To: Liam.Howlett; +Cc: linux-mm, linux-kernel, maple-tree, Peng Zhang

Hi,
There are some fixes for maple tree that may be needed.
When reviewing the maple tree I thought some code was verbose so I did some
cleanup and I double checked the boundaries so there should be no errors. Less
code is easier to maintain, and you can ignore it if you don't like it.
All patches passed the maple tree test program.

Thanks,
Peng.

Peng Zhang (4):
  maple_tree: Fix get wrong data_end in mtree_lookup_walk()
  maple_tree: Simplify mas_wr_node_walk()
  maple_tree: Fix a potential concurrency bug in RCU mode
  maple_tree: Simplify the code of mas_mab_cp()

 lib/maple_tree.c | 76 ++++++++++--------------------------------------
 1 file changed, 16 insertions(+), 60 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] maple_tree: Fix get wrong data_end in mtree_lookup_walk()
  2023-03-10 14:08 [PATCH 0/4] Some fixes and cleanup for maple tree Peng Zhang
@ 2023-03-10 14:08 ` Peng Zhang
  2023-03-10 17:58   ` Liam R. Howlett
  2023-03-10 14:08 ` [PATCH 2/4] maple_tree: Simplify mas_wr_node_walk() Peng Zhang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Peng Zhang @ 2023-03-10 14:08 UTC (permalink / raw)
  To: Liam.Howlett; +Cc: linux-mm, linux-kernel, maple-tree, Peng Zhang

if (likely(offset > end))
	max = pivots[offset];

The above code should be changed to if (likely(offset < end)), which is
correct. This affects the correctness of ma_data_end(). Now it seems
that the final result will not be wrong, but it is best to change it.
This patch does not change the code as above, because it simplifies the
code by the way.

Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
---
 lib/maple_tree.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 646297cae5d1..b3164266cfde 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -3875,18 +3875,13 @@ static inline void *mtree_lookup_walk(struct ma_state *mas)
 		end = ma_data_end(node, type, pivots, max);
 		if (unlikely(ma_dead_node(node)))
 			goto dead_node;
-
-		if (pivots[offset] >= mas->index)
-			goto next;
-
 		do {
-			offset++;
-		} while ((offset < end) && (pivots[offset] < mas->index));
-
-		if (likely(offset > end))
-			max = pivots[offset];
+			if (pivots[offset] >= mas->index) {
+				max = pivots[offset];
+				break;
+			}
+		} while (++offset < end);
 
-next:
 		slots = ma_slots(node, type);
 		next = mt_slot(mas->tree, slots, offset);
 		if (unlikely(ma_dead_node(node)))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] maple_tree: Simplify mas_wr_node_walk()
  2023-03-10 14:08 [PATCH 0/4] Some fixes and cleanup for maple tree Peng Zhang
  2023-03-10 14:08 ` [PATCH 1/4] maple_tree: Fix get wrong data_end in mtree_lookup_walk() Peng Zhang
@ 2023-03-10 14:08 ` Peng Zhang
  2023-03-10 19:20   ` Liam R. Howlett
  2023-03-10 14:08 ` [PATCH 3/4] maple_tree: Fix a potential concurrency bug in RCU mode Peng Zhang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Peng Zhang @ 2023-03-10 14:08 UTC (permalink / raw)
  To: Liam.Howlett; +Cc: linux-mm, linux-kernel, maple-tree, Peng Zhang

Simplify code of mas_wr_node_walk() without changing functionality,
and improve readability.

Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
---
 lib/maple_tree.c | 34 +++++-----------------------------
 1 file changed, 5 insertions(+), 29 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index b3164266cfde..4d15202a0692 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -2251,9 +2251,7 @@ static inline struct maple_enode *mte_node_or_none(struct maple_enode *enode)
 static inline void mas_wr_node_walk(struct ma_wr_state *wr_mas)
 {
 	struct ma_state *mas = wr_mas->mas;
-	unsigned char count;
-	unsigned char offset;
-	unsigned long index, min, max;
+	unsigned char count, offset;
 
 	if (unlikely(ma_is_dense(wr_mas->type))) {
 		wr_mas->r_max = wr_mas->r_min = mas->index;
@@ -2266,34 +2264,12 @@ static inline void mas_wr_node_walk(struct ma_wr_state *wr_mas)
 	count = wr_mas->node_end = ma_data_end(wr_mas->node, wr_mas->type,
 					       wr_mas->pivots, mas->max);
 	offset = mas->offset;
-	min = mas_safe_min(mas, wr_mas->pivots, offset);
-	if (unlikely(offset == count))
-		goto max;
-
-	max = wr_mas->pivots[offset];
-	index = mas->index;
-	if (unlikely(index <= max))
-		goto done;
-
-	if (unlikely(!max && offset))
-		goto max;
 
-	min = max + 1;
-	while (++offset < count) {
-		max = wr_mas->pivots[offset];
-		if (index <= max)
-			goto done;
-		else if (unlikely(!max))
-			break;
-
-		min = max + 1;
-	}
+	while (offset < count && mas->index > wr_mas->pivots[offset])
+		offset++;
 
-max:
-	max = mas->max;
-done:
-	wr_mas->r_max = max;
-	wr_mas->r_min = min;
+	wr_mas->r_max = offset < count ? wr_mas->pivots[offset] : mas->max;
+	wr_mas->r_min = mas_safe_min(mas, wr_mas->pivots, offset);
 	wr_mas->offset_end = mas->offset = offset;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] maple_tree: Fix a potential concurrency bug in RCU mode
  2023-03-10 14:08 [PATCH 0/4] Some fixes and cleanup for maple tree Peng Zhang
  2023-03-10 14:08 ` [PATCH 1/4] maple_tree: Fix get wrong data_end in mtree_lookup_walk() Peng Zhang
  2023-03-10 14:08 ` [PATCH 2/4] maple_tree: Simplify mas_wr_node_walk() Peng Zhang
@ 2023-03-10 14:08 ` Peng Zhang
  2023-03-10 18:27   ` Liam R. Howlett
  2023-03-10 14:08 ` [PATCH 4/4] maple_tree: Simplify the code of mas_mab_cp() Peng Zhang
  2023-03-10 17:54 ` [PATCH 0/4] Some fixes and cleanup for maple tree Liam R. Howlett
  4 siblings, 1 reply; 15+ messages in thread
From: Peng Zhang @ 2023-03-10 14:08 UTC (permalink / raw)
  To: Liam.Howlett; +Cc: linux-mm, linux-kernel, maple-tree, Peng Zhang

There is a concurrency bug that may cause the wrong value to be loaded
when a CPU is modifying the maple tree.

CPU1:
mtree_insert_range()
  mas_insert()
    mas_store_root()
      ...
      mas_root_expand()
        ...
        rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
        ma_set_meta(node, maple_leaf_64, 0, slot);    <---IP

CPU2:
mtree_load()
  mtree_lookup_walk()
    ma_data_end();

When CPU1 is about to execute the instruction pointed to by IP,
the ma_data_end() executed by CPU2 may return the wrong end position,
which will cause the value loaded by mtree_load() to be wrong.

An example of triggering the bug:

Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
mas_root_expand().

static DEFINE_MTREE(tree);
int work(void *p) {
	unsigned long val;
	for (int i = 0 ; i< 30; ++i) {
		val = (unsigned long)mtree_load(&tree, 8);
		mdelay(5);
		pr_info("%lu",val);
	}
	return 0;
}

mt_init_flags(&tree, MT_FLAGS_USE_RCU);
mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
run_thread(work)
mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);

In RCU mode, mtree_load() should always return the value before or after
the data structure is modified, and in this example mtree_load(&tree, 8)
may return 56789 which is not expected, it should always return NULL.
Fix it by put ma_set_meta() before rcu_assign_pointer().

Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
---
 lib/maple_tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 4d15202a0692..de43ff19da72 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -3635,10 +3635,9 @@ static inline int mas_root_expand(struct ma_state *mas, void *entry)
 		slot++;
 	mas->depth = 1;
 	mas_set_height(mas);
-
+	ma_set_meta(node, maple_leaf_64, 0, slot);
 	/* swap the new root into the tree */
 	rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
-	ma_set_meta(node, maple_leaf_64, 0, slot);
 	return slot;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] maple_tree: Simplify the code of mas_mab_cp()
  2023-03-10 14:08 [PATCH 0/4] Some fixes and cleanup for maple tree Peng Zhang
                   ` (2 preceding siblings ...)
  2023-03-10 14:08 ` [PATCH 3/4] maple_tree: Fix a potential concurrency bug in RCU mode Peng Zhang
@ 2023-03-10 14:08 ` Peng Zhang
  2023-03-10 18:45   ` Liam R. Howlett
  2023-03-10 17:54 ` [PATCH 0/4] Some fixes and cleanup for maple tree Liam R. Howlett
  4 siblings, 1 reply; 15+ messages in thread
From: Peng Zhang @ 2023-03-10 14:08 UTC (permalink / raw)
  To: Liam.Howlett; +Cc: linux-mm, linux-kernel, maple-tree, Peng Zhang

Simplify the code of mas_mab_cp(), and improve readability.
No change in functionality.

Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
---
 lib/maple_tree.c | 24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index de43ff19da72..688b062728a2 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1914,32 +1914,18 @@ static inline void mas_mab_cp(struct ma_state *mas, unsigned char mas_start,
 	void __rcu **slots;
 	unsigned long *pivots, *gaps;
 	int i = mas_start, j = mab_start;
-	unsigned char piv_end;
 
 	node = mas_mn(mas);
 	mt = mte_node_type(mas->node);
 	pivots = ma_pivots(node, mt);
-	if (!i) {
-		b_node->pivot[j] = pivots[i++];
-		if (unlikely(i > mas_end))
-			goto complete;
-		j++;
-	}
 
-	piv_end = min(mas_end, mt_pivots[mt]);
-	for (; i < piv_end; i++, j++) {
-		b_node->pivot[j] = pivots[i];
-		if (unlikely(!b_node->pivot[j]))
+	for (; i < min(mas_end, mt_pivots[mt]); i++, j++) {
+		if (unlikely(!pivots[i] && i) ||
+		    unlikely(mas->max == pivots[i]))
 			break;
-
-		if (unlikely(mas->max == b_node->pivot[j]))
-			goto complete;
+		b_node->pivot[j] = pivots[i];
 	}
-
-	if (likely(i <= mas_end))
-		b_node->pivot[j] = mas_safe_pivot(mas, pivots, i, mt);
-
-complete:
+	b_node->pivot[j] = mas_safe_pivot(mas, pivots, i, mt);
 	b_node->b_end = ++j;
 	j -= mab_start;
 	slots = ma_slots(node, mt);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] Some fixes and cleanup for maple tree.
  2023-03-10 14:08 [PATCH 0/4] Some fixes and cleanup for maple tree Peng Zhang
                   ` (3 preceding siblings ...)
  2023-03-10 14:08 ` [PATCH 4/4] maple_tree: Simplify the code of mas_mab_cp() Peng Zhang
@ 2023-03-10 17:54 ` Liam R. Howlett
  4 siblings, 0 replies; 15+ messages in thread
From: Liam R. Howlett @ 2023-03-10 17:54 UTC (permalink / raw)
  To: Peng Zhang; +Cc: linux-mm, linux-kernel, maple-tree

* Peng Zhang <zhangpeng.00@bytedance.com> [230310 09:09]:
> Hi,
> There are some fixes for maple tree that may be needed.
> When reviewing the maple tree I thought some code was verbose so I did some
> cleanup and I double checked the boundaries so there should be no errors. Less
> code is easier to maintain, and you can ignore it if you don't like it.
> All patches passed the maple tree test program.

If you have a bug, please add a test case to the module if it is easy to
do so, otherwise please add a test case to the userspace portion (ie:
things that need rcu or threading, which is more difficult in the
kernel).

> 
> Thanks,
> Peng.
> 
> Peng Zhang (4):
>   maple_tree: Fix get wrong data_end in mtree_lookup_walk()
>   maple_tree: Simplify mas_wr_node_walk()
>   maple_tree: Fix a potential concurrency bug in RCU mode
>   maple_tree: Simplify the code of mas_mab_cp()
> 
>  lib/maple_tree.c | 76 ++++++++++--------------------------------------
>  1 file changed, 16 insertions(+), 60 deletions(-)
> 
> -- 
> 2.20.1
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] maple_tree: Fix get wrong data_end in mtree_lookup_walk()
  2023-03-10 14:08 ` [PATCH 1/4] maple_tree: Fix get wrong data_end in mtree_lookup_walk() Peng Zhang
@ 2023-03-10 17:58   ` Liam R. Howlett
  2023-03-10 18:53     ` Peng Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Liam R. Howlett @ 2023-03-10 17:58 UTC (permalink / raw)
  To: Peng Zhang; +Cc: linux-mm, linux-kernel, maple-tree

* Peng Zhang <zhangpeng.00@bytedance.com> [230310 09:09]:
> if (likely(offset > end))
> 	max = pivots[offset];
> 
> The above code should be changed to if (likely(offset < end)), which is
> correct. This affects the correctness of ma_data_end().

No.  The way it is written is correct.  If we are not at the last slot,
then we take the pivot as the max for the next level of the tree.  If we
are at the last slot, then the max is already the correct value.

>Now it seems
> that the final result will not be wrong, but it is best to change it.

Why is it best to change it?

> This patch does not change the code as above, because it simplifies the
> code by the way.
> 
> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
> ---
>  lib/maple_tree.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 646297cae5d1..b3164266cfde 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -3875,18 +3875,13 @@ static inline void *mtree_lookup_walk(struct ma_state *mas)
>  		end = ma_data_end(node, type, pivots, max);
>  		if (unlikely(ma_dead_node(node)))
>  			goto dead_node;
> -
> -		if (pivots[offset] >= mas->index)
> -			goto next;
> -
>  		do {
> -			offset++;
> -		} while ((offset < end) && (pivots[offset] < mas->index));
> -
> -		if (likely(offset > end))
> -			max = pivots[offset];
> +			if (pivots[offset] >= mas->index) {
> +				max = pivots[offset];

You can overflow the pivots array here because offset can actually be
larger than the array.  I am surprised this passes the maple tree test
program, but with a full node and walking to the end, it will address
the pivots array out of bounds.

I wrote it the way I did to minimize the instructions in the loop by
avoiding the overflow check.

> +				break;
> +			}
> +		} while (++offset < end);
>  
> -next:
>  		slots = ma_slots(node, type);
>  		next = mt_slot(mas->tree, slots, offset);
>  		if (unlikely(ma_dead_node(node)))
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] maple_tree: Fix a potential concurrency bug in RCU mode
  2023-03-10 14:08 ` [PATCH 3/4] maple_tree: Fix a potential concurrency bug in RCU mode Peng Zhang
@ 2023-03-10 18:27   ` Liam R. Howlett
  2023-03-10 19:03     ` Peng Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Liam R. Howlett @ 2023-03-10 18:27 UTC (permalink / raw)
  To: Peng Zhang; +Cc: linux-mm, linux-kernel, maple-tree

* Peng Zhang <zhangpeng.00@bytedance.com> [230310 09:09]:
> There is a concurrency bug that may cause the wrong value to be loaded
> when a CPU is modifying the maple tree.
> 
> CPU1:
> mtree_insert_range()
>   mas_insert()
>     mas_store_root()
>       ...
>       mas_root_expand()
>         ...
>         rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
>         ma_set_meta(node, maple_leaf_64, 0, slot);    <---IP
> 
> CPU2:
> mtree_load()
>   mtree_lookup_walk()
>     ma_data_end();
> 
> When CPU1 is about to execute the instruction pointed to by IP,
> the ma_data_end() executed by CPU2 may return the wrong end position,
> which will cause the value loaded by mtree_load() to be wrong.
> 
> An example of triggering the bug:
> 
> Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
> mas_root_expand().
> 
> static DEFINE_MTREE(tree);
> int work(void *p) {
> 	unsigned long val;
> 	for (int i = 0 ; i< 30; ++i) {
> 		val = (unsigned long)mtree_load(&tree, 8);
> 		mdelay(5);
> 		pr_info("%lu",val);
> 	}
> 	return 0;
> }
> 
> mt_init_flags(&tree, MT_FLAGS_USE_RCU);
> mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
> run_thread(work)
> mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);
> 
> In RCU mode, mtree_load() should always return the value before or after
> the data structure is modified, and in this example mtree_load(&tree, 8)
> may return 56789 which is not expected, it should always return NULL.
> Fix it by put ma_set_meta() before rcu_assign_pointer().

Are you able to write a test case for this issue?  I understand it's a
race so it may be difficult to catch.

> 
> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

> ---
>  lib/maple_tree.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 4d15202a0692..de43ff19da72 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -3635,10 +3635,9 @@ static inline int mas_root_expand(struct ma_state *mas, void *entry)
>  		slot++;
>  	mas->depth = 1;
>  	mas_set_height(mas);
> -
> +	ma_set_meta(node, maple_leaf_64, 0, slot);
>  	/* swap the new root into the tree */
>  	rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
> -	ma_set_meta(node, maple_leaf_64, 0, slot);
>  	return slot;
>  }
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] maple_tree: Simplify the code of mas_mab_cp()
  2023-03-10 14:08 ` [PATCH 4/4] maple_tree: Simplify the code of mas_mab_cp() Peng Zhang
@ 2023-03-10 18:45   ` Liam R. Howlett
  0 siblings, 0 replies; 15+ messages in thread
From: Liam R. Howlett @ 2023-03-10 18:45 UTC (permalink / raw)
  To: Peng Zhang; +Cc: linux-mm, linux-kernel, maple-tree

* Peng Zhang <zhangpeng.00@bytedance.com> [230310 09:09]:
> Simplify the code of mas_mab_cp(), and improve readability.
> No change in functionality.
> 
> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
> ---
>  lib/maple_tree.c | 24 +++++-------------------
>  1 file changed, 5 insertions(+), 19 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index de43ff19da72..688b062728a2 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -1914,32 +1914,18 @@ static inline void mas_mab_cp(struct ma_state *mas, unsigned char mas_start,
>  	void __rcu **slots;
>  	unsigned long *pivots, *gaps;
>  	int i = mas_start, j = mab_start;
> -	unsigned char piv_end;
>  
>  	node = mas_mn(mas);
>  	mt = mte_node_type(mas->node);
>  	pivots = ma_pivots(node, mt);
> -	if (!i) {
> -		b_node->pivot[j] = pivots[i++];
> -		if (unlikely(i > mas_end))
> -			goto complete;
> -		j++;
> -	}
>  
> -	piv_end = min(mas_end, mt_pivots[mt]);
> -	for (; i < piv_end; i++, j++) {
> -		b_node->pivot[j] = pivots[i];
> -		if (unlikely(!b_node->pivot[j]))
> +	for (; i < min(mas_end, mt_pivots[mt]); i++, j++) {

Please don't inline the min here, it is not improving readability.

> +		if (unlikely(!pivots[i] && i) ||
> +		    unlikely(mas->max == pivots[i]))

By not doing the special case outside the loop, you have added a check
to every loop iteration.  I took these special cases out after profiling
the code with perf.  I get they aren't as easy to read but they are
faster which is important for something executed this much.

You have also made this if statement more complex which is not improving
readability.

>  			break;
> -
> -		if (unlikely(mas->max == b_node->pivot[j]))
> -			goto complete;
> +		b_node->pivot[j] = pivots[i];
>  	}
> -
> -	if (likely(i <= mas_end))
> -		b_node->pivot[j] = mas_safe_pivot(mas, pivots, i, mt);
> -
> -complete:
> +	b_node->pivot[j] = mas_safe_pivot(mas, pivots, i, mt);
>  	b_node->b_end = ++j;
>  	j -= mab_start;
>  	slots = ma_slots(node, mt);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] maple_tree: Fix get wrong data_end in mtree_lookup_walk()
  2023-03-10 17:58   ` Liam R. Howlett
@ 2023-03-10 18:53     ` Peng Zhang
  2023-03-10 19:28       ` Liam R. Howlett
  0 siblings, 1 reply; 15+ messages in thread
From: Peng Zhang @ 2023-03-10 18:53 UTC (permalink / raw)
  To: Liam R. Howlett, Peng Zhang, linux-mm, linux-kernel, maple-tree


在 2023/3/11 01:58, Liam R. Howlett 写道:
> * Peng Zhang <zhangpeng.00@bytedance.com> [230310 09:09]:
>> if (likely(offset > end))
>> 	max = pivots[offset];
>>
>> The above code should be changed to if (likely(offset < end)), which is
>> correct. This affects the correctness of ma_data_end().
> No.  The way it is written is correct.  If we are not at the last slot,
> then we take the pivot as the max for the next level of the tree.  If we
> are at the last slot, then the max is already the correct value.

As you said, If we are not at the last slot, we take the pivot as the max

for the next level of the tree. At this time, “offset < end” is satisfied,

but in the original code, when offset > end, take the pivot as the max.

Please *think again*, it is wrong. The code may have been written 
incorrectly
by mistake, not what you said it was written.

>> Now it seems
>> that the final result will not be wrong, but it is best to change it.
> Why is it best to change it?
>
>> This patch does not change the code as above, because it simplifies the
>> code by the way.
>>
>> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
>> ---
>>   lib/maple_tree.c | 15 +++++----------
>>   1 file changed, 5 insertions(+), 10 deletions(-)
>>
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index 646297cae5d1..b3164266cfde 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -3875,18 +3875,13 @@ static inline void *mtree_lookup_walk(struct ma_state *mas)
>>   		end = ma_data_end(node, type, pivots, max);
>>   		if (unlikely(ma_dead_node(node)))
>>   			goto dead_node;
>> -
>> -		if (pivots[offset] >= mas->index)
>> -			goto next;
>> -
>>   		do {
>> -			offset++;
>> -		} while ((offset < end) && (pivots[offset] < mas->index));
>> -
>> -		if (likely(offset > end))
>> -			max = pivots[offset];
>> +			if (pivots[offset] >= mas->index) {
>> +				max = pivots[offset];
> You can overflow the pivots array here because offset can actually be
> larger than the array.  I am surprised this passes the maple tree test
> program, but with a full node and walking to the end, it will address
> the pivots array out of bounds.
>
> I wrote it the way I did to minimize the instructions in the loop by
> avoiding the overflow check.

It is not possible overflow pivots array, because only when
"while (++offset < end)" is satisfied, we enter the loop body.
So if we access pivots[offset], “offset < end” must be satisfied.
Maybe you need to read the code to know, instead of looking at
the diff.

The modified code looks like this:

         do {
             if (pivots[offset] >= mas->index) {
                 max = pivots[offset];
                 break;
             }
         } while (++offset < end);

>> +				break;
>> +			}
>> +		} while (++offset < end);
>>   
>> -next:
>>   		slots = ma_slots(node, type);
>>   		next = mt_slot(mas->tree, slots, offset);
>>   		if (unlikely(ma_dead_node(node)))
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] maple_tree: Fix a potential concurrency bug in RCU mode
  2023-03-10 18:27   ` Liam R. Howlett
@ 2023-03-10 19:03     ` Peng Zhang
  2023-03-10 19:29       ` Liam R. Howlett
  0 siblings, 1 reply; 15+ messages in thread
From: Peng Zhang @ 2023-03-10 19:03 UTC (permalink / raw)
  To: Liam R. Howlett, Peng Zhang, linux-mm, linux-kernel, maple-tree


在 2023/3/11 02:27, Liam R. Howlett 写道:
> * Peng Zhang <zhangpeng.00@bytedance.com> [230310 09:09]:
>> There is a concurrency bug that may cause the wrong value to be loaded
>> when a CPU is modifying the maple tree.
>>
>> CPU1:
>> mtree_insert_range()
>>    mas_insert()
>>      mas_store_root()
>>        ...
>>        mas_root_expand()
>>          ...
>>          rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
>>          ma_set_meta(node, maple_leaf_64, 0, slot);    <---IP
>>
>> CPU2:
>> mtree_load()
>>    mtree_lookup_walk()
>>      ma_data_end();
>>
>> When CPU1 is about to execute the instruction pointed to by IP,
>> the ma_data_end() executed by CPU2 may return the wrong end position,
>> which will cause the value loaded by mtree_load() to be wrong.
>>
>> An example of triggering the bug:
>>
>> Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
>> mas_root_expand().
>>
>> static DEFINE_MTREE(tree);
>> int work(void *p) {
>> 	unsigned long val;
>> 	for (int i = 0 ; i< 30; ++i) {
>> 		val = (unsigned long)mtree_load(&tree, 8);
>> 		mdelay(5);
>> 		pr_info("%lu",val);
>> 	}
>> 	return 0;
>> }
>>
>> mt_init_flags(&tree, MT_FLAGS_USE_RCU);
>> mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
>> run_thread(work)
>> mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);
>>
>> In RCU mode, mtree_load() should always return the value before or after
>> the data structure is modified, and in this example mtree_load(&tree, 8)
>> may return 56789 which is not expected, it should always return NULL.
>> Fix it by put ma_set_meta() before rcu_assign_pointer().
> Are you able to write a test case for this issue?  I understand it's a
> race so it may be difficult to catch.
Yes it's hard to catch. I'll try to think of a test case next week.
It is difficult because of the need to expand the competitive area.
>
>> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>
>> ---
>>   lib/maple_tree.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index 4d15202a0692..de43ff19da72 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -3635,10 +3635,9 @@ static inline int mas_root_expand(struct ma_state *mas, void *entry)
>>   		slot++;
>>   	mas->depth = 1;
>>   	mas_set_height(mas);
>> -
>> +	ma_set_meta(node, maple_leaf_64, 0, slot);
>>   	/* swap the new root into the tree */
>>   	rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
>> -	ma_set_meta(node, maple_leaf_64, 0, slot);
>>   	return slot;
>>   }
>>   
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] maple_tree: Simplify mas_wr_node_walk()
  2023-03-10 14:08 ` [PATCH 2/4] maple_tree: Simplify mas_wr_node_walk() Peng Zhang
@ 2023-03-10 19:20   ` Liam R. Howlett
  2023-03-13 14:07     ` Peng Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Liam R. Howlett @ 2023-03-10 19:20 UTC (permalink / raw)
  To: Peng Zhang; +Cc: linux-mm, linux-kernel, maple-tree

* Peng Zhang <zhangpeng.00@bytedance.com> [230310 09:09]:
> Simplify code of mas_wr_node_walk() without changing functionality,
> and improve readability.

The change log needs to be updated to 

I like this change, thanks.

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

> 
> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
> ---
>  lib/maple_tree.c | 34 +++++-----------------------------
>  1 file changed, 5 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index b3164266cfde..4d15202a0692 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -2251,9 +2251,7 @@ static inline struct maple_enode *mte_node_or_none(struct maple_enode *enode)
>  static inline void mas_wr_node_walk(struct ma_wr_state *wr_mas)
>  {
>  	struct ma_state *mas = wr_mas->mas;
> -	unsigned char count;
> -	unsigned char offset;
> -	unsigned long index, min, max;
> +	unsigned char count, offset;
>  
>  	if (unlikely(ma_is_dense(wr_mas->type))) {
>  		wr_mas->r_max = wr_mas->r_min = mas->index;
> @@ -2266,34 +2264,12 @@ static inline void mas_wr_node_walk(struct ma_wr_state *wr_mas)
>  	count = wr_mas->node_end = ma_data_end(wr_mas->node, wr_mas->type,
>  					       wr_mas->pivots, mas->max);
>  	offset = mas->offset;
> -	min = mas_safe_min(mas, wr_mas->pivots, offset);
> -	if (unlikely(offset == count))
> -		goto max;
> -
> -	max = wr_mas->pivots[offset];
> -	index = mas->index;
> -	if (unlikely(index <= max))
> -		goto done;
> -
> -	if (unlikely(!max && offset))
> -		goto max;
>  
> -	min = max + 1;
> -	while (++offset < count) {
> -		max = wr_mas->pivots[offset];
> -		if (index <= max)
> -			goto done;
> -		else if (unlikely(!max))
> -			break;
> -
> -		min = max + 1;
> -	}
> +	while (offset < count && mas->index > wr_mas->pivots[offset])
> +		offset++;
>  
> -max:
> -	max = mas->max;
> -done:
> -	wr_mas->r_max = max;
> -	wr_mas->r_min = min;
> +	wr_mas->r_max = offset < count ? wr_mas->pivots[offset] : mas->max;
> +	wr_mas->r_min = mas_safe_min(mas, wr_mas->pivots, offset);
>  	wr_mas->offset_end = mas->offset = offset;
>  }
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] maple_tree: Fix get wrong data_end in mtree_lookup_walk()
  2023-03-10 18:53     ` Peng Zhang
@ 2023-03-10 19:28       ` Liam R. Howlett
  0 siblings, 0 replies; 15+ messages in thread
From: Liam R. Howlett @ 2023-03-10 19:28 UTC (permalink / raw)
  To: Peng Zhang; +Cc: linux-mm, linux-kernel, maple-tree

* Peng Zhang <zhangpeng.00@bytedance.com> [230310 13:53]:
> 
> 在 2023/3/11 01:58, Liam R. Howlett 写道:
> > * Peng Zhang <zhangpeng.00@bytedance.com> [230310 09:09]:
> > > if (likely(offset > end))
> > > 	max = pivots[offset];
> > > 
> > > The above code should be changed to if (likely(offset < end)), which is
> > > correct. This affects the correctness of ma_data_end().
> > No.  The way it is written is correct.  If we are not at the last slot,
> > then we take the pivot as the max for the next level of the tree.  If we
> > are at the last slot, then the max is already the correct value.
> 
> As you said, If we are not at the last slot, we take the pivot as the max
> 
for the next level of the tree. At this time, “offset < end” is satisfied,

> but in the original code, when offset > end, take the pivot as the max.
> 
Please *think again*, it is wrong. The code may have been written
> incorrectly
> by mistake, not what you said it was written.

Sorry, yes.  That does look like a bug.

> 
> > > Now it seems
> > > that the final result will not be wrong, but it is best to change it.
> > Why is it best to change it?
> > 
> > > This patch does not change the code as above, because it simplifies the
> > > code by the way.
> > > 
> > > Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
> > > ---
> > >   lib/maple_tree.c | 15 +++++----------
> > >   1 file changed, 5 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> > > index 646297cae5d1..b3164266cfde 100644
> > > --- a/lib/maple_tree.c
> > > +++ b/lib/maple_tree.c
> > > @@ -3875,18 +3875,13 @@ static inline void *mtree_lookup_walk(struct ma_state *mas)
> > >   		end = ma_data_end(node, type, pivots, max);
> > >   		if (unlikely(ma_dead_node(node)))
> > >   			goto dead_node;
> > > -
> > > -		if (pivots[offset] >= mas->index)
> > > -			goto next;
> > > -
> > >   		do {
> > > -			offset++;
> > > -		} while ((offset < end) && (pivots[offset] < mas->index));
> > > -
> > > -		if (likely(offset > end))
> > > -			max = pivots[offset];
> > > +			if (pivots[offset] >= mas->index) {
> > > +				max = pivots[offset];
> > You can overflow the pivots array here because offset can actually be
> > larger than the array.  I am surprised this passes the maple tree test
> > program, but with a full node and walking to the end, it will address
> > the pivots array out of bounds.
> > 
> > I wrote it the way I did to minimize the instructions in the loop by
> > avoiding the overflow check.
> 
> It is not possible overflow pivots array, because only when
> "while (++offset < end)" is satisfied, we enter the loop body.
> So if we access pivots[offset], “offset < end” must be satisfied.
> Maybe you need to read the code to know, instead of looking at
> the diff.
> 
> The modified code looks like this:
> 
>         do {
>             if (pivots[offset] >= mas->index) {
>                 max = pivots[offset];
>                 break;
>             }
>         } while (++offset < end);
> 

Yes, you are right.  It will terminate before overflowing.

Since this is a fix, it needs a fixes tag in the commit log.  Can you
come up with a test case for this one as well?

Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

> > > +				break;
> > > +			}
> > > +		} while (++offset < end);
> > > -next:
> > >   		slots = ma_slots(node, type);
> > >   		next = mt_slot(mas->tree, slots, offset);
> > >   		if (unlikely(ma_dead_node(node)))
> > > -- 
> > > 2.20.1
> > > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] maple_tree: Fix a potential concurrency bug in RCU mode
  2023-03-10 19:03     ` Peng Zhang
@ 2023-03-10 19:29       ` Liam R. Howlett
  0 siblings, 0 replies; 15+ messages in thread
From: Liam R. Howlett @ 2023-03-10 19:29 UTC (permalink / raw)
  To: Peng Zhang; +Cc: linux-mm, linux-kernel, maple-tree

* Peng Zhang <zhangpeng.00@bytedance.com> [230310 14:03]:
> 
> 在 2023/3/11 02:27, Liam R. Howlett 写道:
> > * Peng Zhang <zhangpeng.00@bytedance.com> [230310 09:09]:
> > > There is a concurrency bug that may cause the wrong value to be loaded
> > > when a CPU is modifying the maple tree.
> > > 
> > > CPU1:
> > > mtree_insert_range()
> > >    mas_insert()
> > >      mas_store_root()
> > >        ...
> > >        mas_root_expand()
> > >          ...
> > >          rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
> > >          ma_set_meta(node, maple_leaf_64, 0, slot);    <---IP
> > > 
> > > CPU2:
> > > mtree_load()
> > >    mtree_lookup_walk()
> > >      ma_data_end();
> > > 
> > > When CPU1 is about to execute the instruction pointed to by IP,
> > > the ma_data_end() executed by CPU2 may return the wrong end position,
> > > which will cause the value loaded by mtree_load() to be wrong.
> > > 
> > > An example of triggering the bug:
> > > 
> > > Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
> > > mas_root_expand().
> > > 
> > > static DEFINE_MTREE(tree);
> > > int work(void *p) {
> > > 	unsigned long val;
> > > 	for (int i = 0 ; i< 30; ++i) {
> > > 		val = (unsigned long)mtree_load(&tree, 8);
> > > 		mdelay(5);
> > > 		pr_info("%lu",val);
> > > 	}
> > > 	return 0;
> > > }
> > > 
> > > mt_init_flags(&tree, MT_FLAGS_USE_RCU);
> > > mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
> > > run_thread(work)
> > > mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);
> > > 
> > > In RCU mode, mtree_load() should always return the value before or after
> > > the data structure is modified, and in this example mtree_load(&tree, 8)
> > > may return 56789 which is not expected, it should always return NULL.
> > > Fix it by put ma_set_meta() before rcu_assign_pointer().
> > Are you able to write a test case for this issue?  I understand it's a
> > race so it may be difficult to catch.
> Yes it's hard to catch. I'll try to think of a test case next week.
> It is difficult because of the need to expand the competitive area.

This should have a fixes tag as well.

Fixes: 54a611b60590 ("Maple Tree: add new data structure")

> > 
> > > Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
> > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > 
> > > ---
> > >   lib/maple_tree.c | 3 +--
> > >   1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> > > index 4d15202a0692..de43ff19da72 100644
> > > --- a/lib/maple_tree.c
> > > +++ b/lib/maple_tree.c
> > > @@ -3635,10 +3635,9 @@ static inline int mas_root_expand(struct ma_state *mas, void *entry)
> > >   		slot++;
> > >   	mas->depth = 1;
> > >   	mas_set_height(mas);
> > > -
> > > +	ma_set_meta(node, maple_leaf_64, 0, slot);
> > >   	/* swap the new root into the tree */
> > >   	rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
> > > -	ma_set_meta(node, maple_leaf_64, 0, slot);
> > >   	return slot;
> > >   }
> > > -- 
> > > 2.20.1
> > > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] maple_tree: Simplify mas_wr_node_walk()
  2023-03-10 19:20   ` Liam R. Howlett
@ 2023-03-13 14:07     ` Peng Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Peng Zhang @ 2023-03-13 14:07 UTC (permalink / raw)
  To: Liam R. Howlett, Peng Zhang, linux-mm, linux-kernel, maple-tree


在 2023/3/11 03:20, Liam R. Howlett 写道:
> * Peng Zhang <zhangpeng.00@bytedance.com> [230310 09:09]:
>> Simplify code of mas_wr_node_walk() without changing functionality,
>> and improve readability.
> The change log needs to be updated to

Hi,

Update to what? Did you not finish typing?

>
> I like this change, thanks.
>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>
>> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
>> ---
>>   lib/maple_tree.c | 34 +++++-----------------------------
>>   1 file changed, 5 insertions(+), 29 deletions(-)
>>
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index b3164266cfde..4d15202a0692 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -2251,9 +2251,7 @@ static inline struct maple_enode *mte_node_or_none(struct maple_enode *enode)
>>   static inline void mas_wr_node_walk(struct ma_wr_state *wr_mas)
>>   {
>>   	struct ma_state *mas = wr_mas->mas;
>> -	unsigned char count;
>> -	unsigned char offset;
>> -	unsigned long index, min, max;
>> +	unsigned char count, offset;
>>   
>>   	if (unlikely(ma_is_dense(wr_mas->type))) {
>>   		wr_mas->r_max = wr_mas->r_min = mas->index;
>> @@ -2266,34 +2264,12 @@ static inline void mas_wr_node_walk(struct ma_wr_state *wr_mas)
>>   	count = wr_mas->node_end = ma_data_end(wr_mas->node, wr_mas->type,
>>   					       wr_mas->pivots, mas->max);
>>   	offset = mas->offset;
>> -	min = mas_safe_min(mas, wr_mas->pivots, offset);
>> -	if (unlikely(offset == count))
>> -		goto max;
>> -
>> -	max = wr_mas->pivots[offset];
>> -	index = mas->index;
>> -	if (unlikely(index <= max))
>> -		goto done;
>> -
>> -	if (unlikely(!max && offset))
>> -		goto max;
>>   
>> -	min = max + 1;
>> -	while (++offset < count) {
>> -		max = wr_mas->pivots[offset];
>> -		if (index <= max)
>> -			goto done;
>> -		else if (unlikely(!max))
>> -			break;
>> -
>> -		min = max + 1;
>> -	}
>> +	while (offset < count && mas->index > wr_mas->pivots[offset])
>> +		offset++;
>>   
>> -max:
>> -	max = mas->max;
>> -done:
>> -	wr_mas->r_max = max;
>> -	wr_mas->r_min = min;
>> +	wr_mas->r_max = offset < count ? wr_mas->pivots[offset] : mas->max;
>> +	wr_mas->r_min = mas_safe_min(mas, wr_mas->pivots, offset);
>>   	wr_mas->offset_end = mas->offset = offset;
>>   }
>>   
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-03-13 14:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-10 14:08 [PATCH 0/4] Some fixes and cleanup for maple tree Peng Zhang
2023-03-10 14:08 ` [PATCH 1/4] maple_tree: Fix get wrong data_end in mtree_lookup_walk() Peng Zhang
2023-03-10 17:58   ` Liam R. Howlett
2023-03-10 18:53     ` Peng Zhang
2023-03-10 19:28       ` Liam R. Howlett
2023-03-10 14:08 ` [PATCH 2/4] maple_tree: Simplify mas_wr_node_walk() Peng Zhang
2023-03-10 19:20   ` Liam R. Howlett
2023-03-13 14:07     ` Peng Zhang
2023-03-10 14:08 ` [PATCH 3/4] maple_tree: Fix a potential concurrency bug in RCU mode Peng Zhang
2023-03-10 18:27   ` Liam R. Howlett
2023-03-10 19:03     ` Peng Zhang
2023-03-10 19:29       ` Liam R. Howlett
2023-03-10 14:08 ` [PATCH 4/4] maple_tree: Simplify the code of mas_mab_cp() Peng Zhang
2023-03-10 18:45   ` Liam R. Howlett
2023-03-10 17:54 ` [PATCH 0/4] Some fixes and cleanup for maple tree Liam R. Howlett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).