All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth
@ 2020-05-15 12:47 Lai Jiangshan
  2020-05-15 12:47 ` [PATCH 2/2] rbtree_latch: don't need to check seq when it found a node Lai Jiangshan
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Lai Jiangshan @ 2020-05-15 12:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Peter Zijlstra, Paul E . McKenney, Oleg Nesterov,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel, Mathieu Desnoyers

lib/rbtree.c has ensured that there is not possible to
inadvertently cause (temporary) loops in the tree structure
as seen in program order of the modifier. But loop is still
possible to be seen in searcher due to CPU's reordering.

for example:
modifier				searcher

left rotate at parent
parent->rb_right is node
					search to parent
					parent->rb_right is node
				     +->see node->rb_left changed
WRITE_ONCE(parent->rb_right, tmp);-+ |  node->rb_left is parennt
no smp_wmb(), some arch can        | |
reorder these two writes           | |  loop long between
WRITE_ONCE(node->rb_left, parent);-+-+  parent and node
				   |
				   +--->finally see
					parent->rb_right

The long loop won't stop until the modifer's CPU flushes
its writes. Too avoid it, we should limit the searching depth.
There are no more than (1<<BITS_PER_LONG)-1 nodes in the tree.
And the max_depth of a tree is no more than 2*lg(node_count+1),
which is no mare than 2*BITS_PER_LONG.

So the serarch should stop when diving down up to
2*BITS_PER_LONG depth.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 include/linux/rbtree_latch.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
index 7d012faa509a..b012bd95eabf 100644
--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -107,10 +107,11 @@ __lt_find(void *key, struct latch_tree_root *ltr, int idx,
 	  int (*comp)(void *key, struct latch_tree_node *node))
 {
 	struct rb_node *node = rcu_dereference_raw(ltr->tree[idx].rb_node);
+	int depth = 2 * BITS_PER_LONG;
 	struct latch_tree_node *ltn;
 	int c;
 
-	while (node) {
+	while (node && depth > 0) {
 		ltn = __lt_from_rb(node, idx);
 		c = comp(key, ltn);
 
@@ -120,6 +121,7 @@ __lt_find(void *key, struct latch_tree_root *ltr, int idx,
 			node = rcu_dereference_raw(node->rb_right);
 		else
 			return ltn;
+		depth--;
 	}
 
 	return NULL;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] rbtree_latch: don't need to check seq when it found a node
  2020-05-15 12:47 [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth Lai Jiangshan
@ 2020-05-15 12:47 ` Lai Jiangshan
  2020-05-15 13:04   ` Peter Zijlstra
  2020-05-15 13:00 ` [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth Peter Zijlstra
  2020-05-15 13:14 ` [PATCH " Mathieu Desnoyers
  2 siblings, 1 reply; 13+ messages in thread
From: Lai Jiangshan @ 2020-05-15 12:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Peter Zijlstra, Paul E . McKenney, Oleg Nesterov,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel, Mathieu Desnoyers

latch_tree_find() should be protected by caller via RCU or so.
When it find a node in an attempt, the node must be a valid one.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 include/linux/rbtree_latch.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
index b012bd95eabf..09a3c05d1c5b 100644
--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -208,7 +208,7 @@ latch_tree_find(void *key, struct latch_tree_root *root,
 	do {
 		seq = raw_read_seqcount_latch(&root->seq);
 		node = __lt_find(key, root, seq & 1, ops->comp);
-	} while (read_seqcount_retry(&root->seq, seq));
+	} while (!node && read_seqcount_retry(&root->seq, seq));
 
 	return node;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth
  2020-05-15 12:47 [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth Lai Jiangshan
  2020-05-15 12:47 ` [PATCH 2/2] rbtree_latch: don't need to check seq when it found a node Lai Jiangshan
@ 2020-05-15 13:00 ` Peter Zijlstra
  2020-05-15 14:39   ` Lai Jiangshan
  2020-05-15 13:14 ` [PATCH " Mathieu Desnoyers
  2 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2020-05-15 13:00 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Paul E . McKenney, Oleg Nesterov,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel, Mathieu Desnoyers

On Fri, May 15, 2020 at 12:47:06PM +0000, Lai Jiangshan wrote:
> lib/rbtree.c has ensured that there is not possible to
> inadvertently cause (temporary) loops in the tree structure
> as seen in program order of the modifier. But loop is still
> possible to be seen in searcher due to CPU's reordering.
> 
> for example:
> modifier				searcher
> 
> left rotate at parent
> parent->rb_right is node
> 					search to parent
> 					parent->rb_right is node
> 				     +->see node->rb_left changed
> WRITE_ONCE(parent->rb_right, tmp);-+ |  node->rb_left is parennt
> no smp_wmb(), some arch can        | |
> reorder these two writes           | |  loop long between
> WRITE_ONCE(node->rb_left, parent);-+-+  parent and node
> 				   |
> 				   +--->finally see
> 					parent->rb_right
> 
> The long loop won't stop until the modifer's CPU flushes
> its writes. Too avoid it, we should limit the searching depth.

Cute, have you actually observed this? Did you have performance issues?

> There are no more than (1<<BITS_PER_LONG)-1 nodes in the tree.
> And the max_depth of a tree is no more than 2*lg(node_count+1),
> which is no mare than 2*BITS_PER_LONG.
> 
> So the serarch should stop when diving down up to
> 2*BITS_PER_LONG depth.

Arguably you can have a larger key space, but I think due to memory
constraints this limit still isn't wrong. But I do feel you need a
comment with that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] rbtree_latch: don't need to check seq when it found a node
  2020-05-15 12:47 ` [PATCH 2/2] rbtree_latch: don't need to check seq when it found a node Lai Jiangshan
@ 2020-05-15 13:04   ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2020-05-15 13:04 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Paul E . McKenney, Oleg Nesterov,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel, Mathieu Desnoyers

On Fri, May 15, 2020 at 12:47:07PM +0000, Lai Jiangshan wrote:
> latch_tree_find() should be protected by caller via RCU or so.
> When it find a node in an attempt, the node must be a valid one.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: David Woodhouse <David.Woodhouse@intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---
>  include/linux/rbtree_latch.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
> index b012bd95eabf..09a3c05d1c5b 100644
> --- a/include/linux/rbtree_latch.h
> +++ b/include/linux/rbtree_latch.h
> @@ -208,7 +208,7 @@ latch_tree_find(void *key, struct latch_tree_root *root,
>  	do {
>  		seq = raw_read_seqcount_latch(&root->seq);
>  		node = __lt_find(key, root, seq & 1, ops->comp);
> -	} while (read_seqcount_retry(&root->seq, seq));
> +	} while (!node && read_seqcount_retry(&root->seq, seq));

So in the case where we search for key=N and race with { erase(N);
insert(N) }, we can now return the old N, as opposed to the new N.

But given that is entirely subject to timing anyway, that is irrelevant.
We change the boundary case in the timing.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth
  2020-05-15 12:47 [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth Lai Jiangshan
  2020-05-15 12:47 ` [PATCH 2/2] rbtree_latch: don't need to check seq when it found a node Lai Jiangshan
  2020-05-15 13:00 ` [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth Peter Zijlstra
@ 2020-05-15 13:14 ` Mathieu Desnoyers
  2 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2020-05-15 13:14 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Peter Zijlstra, paulmck, Oleg Nesterov,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel

----- On May 15, 2020, at 8:47 AM, Lai Jiangshan laijs@linux.alibaba.com wrote:
[...]
> So the serarch should stop when diving down up to
> 2*BITS_PER_LONG depth.

serarch -> search

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth
  2020-05-15 13:00 ` [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth Peter Zijlstra
@ 2020-05-15 14:39   ` Lai Jiangshan
  2020-05-15 15:01     ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Lai Jiangshan @ 2020-05-15 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lai Jiangshan, LKML, Paul E . McKenney, Oleg Nesterov,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel, Mathieu Desnoyers

On Fri, May 15, 2020 at 9:04 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, May 15, 2020 at 12:47:06PM +0000, Lai Jiangshan wrote:
> > lib/rbtree.c has ensured that there is not possible to
> > inadvertently cause (temporary) loops in the tree structure
> > as seen in program order of the modifier. But loop is still
> > possible to be seen in searcher due to CPU's reordering.
> >
> > for example:
> > modifier                              searcher
> >
> > left rotate at parent
> > parent->rb_right is node
> >                                       search to parent
> >                                       parent->rb_right is node
> >                                    +->see node->rb_left changed
> > WRITE_ONCE(parent->rb_right, tmp);-+ |  node->rb_left is parennt
> > no smp_wmb(), some arch can        | |
> > reorder these two writes           | |  loop long between
> > WRITE_ONCE(node->rb_left, parent);-+-+  parent and node
> >                                  |
> >                                  +--->finally see
> >                                       parent->rb_right
> >
> > The long loop won't stop until the modifer's CPU flushes
> > its writes. Too avoid it, we should limit the searching depth.
>
> Cute, have you actually observed this? Did you have performance issues?

I can only test it on x86 by now, which implies smp_wmb() between
writes. I haven't observed any thing wrong. I'm just imaging
it on some other ARCHs.

I accidentally found this part of code when I searched for
whether there is any attempt again to use rbtree with RCU, and
whether there are the cases besides speculative page fault.

>
> > There are no more than (1<<BITS_PER_LONG)-1 nodes in the tree.
> > And the max_depth of a tree is no more than 2*lg(node_count+1),
> > which is no mare than 2*BITS_PER_LONG.
> >
> > So the serarch should stop when diving down up to
> > 2*BITS_PER_LONG depth.
>
> Arguably you can have a larger key space, but I think due to memory
> constraints this limit still isn't wrong. But I do feel you need a
> comment with that.

Sure, I will add some comments about why "2*BITS_PER_LONG" in code.

But how it could be larger key space? there are not more than
(1<<BITS_PER_LONG) bytes in the kernel dereferencable address
space, and (1<<BITS_PER_LONG)/sizeof(rb_node) must be less than
(1<<BITS_PER_LONG)-1.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth
  2020-05-15 14:39   ` Lai Jiangshan
@ 2020-05-15 15:01     ` Peter Zijlstra
  2020-05-15 15:59       ` [PATCH V2 " Lai Jiangshan
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2020-05-15 15:01 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, LKML, Paul E . McKenney, Oleg Nesterov,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel, Mathieu Desnoyers

On Fri, May 15, 2020 at 10:39:25PM +0800, Lai Jiangshan wrote:
> On Fri, May 15, 2020 at 9:04 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, May 15, 2020 at 12:47:06PM +0000, Lai Jiangshan wrote:
> > > lib/rbtree.c has ensured that there is not possible to
> > > inadvertently cause (temporary) loops in the tree structure
> > > as seen in program order of the modifier. But loop is still
> > > possible to be seen in searcher due to CPU's reordering.
> > >
> > > for example:
> > > modifier                              searcher
> > >
> > > left rotate at parent
> > > parent->rb_right is node
> > >                                       search to parent
> > >                                       parent->rb_right is node
> > >                                    +->see node->rb_left changed
> > > WRITE_ONCE(parent->rb_right, tmp);-+ |  node->rb_left is parennt
> > > no smp_wmb(), some arch can        | |
> > > reorder these two writes           | |  loop long between
> > > WRITE_ONCE(node->rb_left, parent);-+-+  parent and node
> > >                                  |
> > >                                  +--->finally see
> > >                                       parent->rb_right
> > >
> > > The long loop won't stop until the modifer's CPU flushes
> > > its writes. Too avoid it, we should limit the searching depth.
> >
> > Cute, have you actually observed this? Did you have performance issues?
> 
> I can only test it on x86 by now, which implies smp_wmb() between
> writes. I haven't observed any thing wrong. I'm just imaging
> it on some other ARCHs.

Note that smp_wmb() doesn't imply flushing the store-buffers. Nor does
the TSO memory model of x86 (it's the main feature that distinguishes
TSO from SC).

x86's MFENCE is a completion barrier and does imply so though.

> I accidentally found this part of code when I searched for
> whether there is any attempt again to use rbtree with RCU, and
> whether there are the cases besides speculative page fault.

It got mentioned earlier in the contect of a stream of changes, an
uninterrupted modifier can basically starve a search.

But I don't think that's a problem with the current users.

> > > There are no more than (1<<BITS_PER_LONG)-1 nodes in the tree.
> > > And the max_depth of a tree is no more than 2*lg(node_count+1),
> > > which is no mare than 2*BITS_PER_LONG.
> > >
> > > So the serarch should stop when diving down up to
> > > 2*BITS_PER_LONG depth.
> >
> > Arguably you can have a larger key space, but I think due to memory
> > constraints this limit still isn't wrong. But I do feel you need a
> > comment with that.
> 
> Sure, I will add some comments about why "2*BITS_PER_LONG" in code.
> 
> But how it could be larger key space? there are not more than
> (1<<BITS_PER_LONG) bytes in the kernel dereferencable address
> space, and (1<<BITS_PER_LONG)/sizeof(rb_node) must be less than
> (1<<BITS_PER_LONG)-1.

Well, the key space is determined by the comparator operators that are
provided, which can easily compare values that are larger than 64bit.

But yes, the address space implies limits regardless of the actual
key-space. Note that BITS_PER_LONG does not related to the actual memory
space for things like i386-PAE and ARMv7-LPEA.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2 1/2] rbtree_latch: quit searching when reaching to maximum depth
  2020-05-15 15:01     ` Peter Zijlstra
@ 2020-05-15 15:59       ` Lai Jiangshan
  2020-05-15 15:59         ` [PATCH V2 2/2] rbtree_latch: don't need to check seq when it found a node Lai Jiangshan
  2020-05-23  0:56         ` [PATCH V2 1/2] rbtree_latch: quit searching when reaching to maximum depth Lai Jiangshan
  0 siblings, 2 replies; 13+ messages in thread
From: Lai Jiangshan @ 2020-05-15 15:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Peter Zijlstra, Paul E . McKenney, Oleg Nesterov,
	Michel Lespinasse, Andrea Arcangeli, Rik van Riel,
	Mathieu Desnoyers

lib/rbtree.c has ensured that there is not possible to
inadvertently cause (temporary) loops in the tree structure
as seen in program order of the modifier. But loop is still
possible to be seen in searcher due to CPU's reordering.

for example:
modifier				searcher

left rotate at parent
parent->rb_right is node
					search to parent
					parent->rb_right is node
				     +->see node->rb_left changed
WRITE_ONCE(parent->rb_right, tmp);-+ |  node->rb_left is parennt
no smp_wmb(), some ARCHs can       | |
reorder these two writes           | |  loop long between
WRITE_ONCE(node->rb_left, parent);-+-+  parent and node
				   |
				   +--->finally see
					parent->rb_right

The long loop won't stop until the modifer's CPU flushes
its writes. Too avoid it, we should limit the searching depth.
There are no more than (1<<BITS_PER_LONG)-1 nodes in the tree.
And the max_depth of a tree is no more than 2*lg(node_count+1),
which is no mare than 2*BITS_PER_LONG.

So the searching should stop when diving down up to
2*BITS_PER_LONG depth.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 include/linux/rbtree_latch.h | 39 ++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
index 7d012faa509a..638942f53c0a 100644
--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -102,11 +102,47 @@ __lt_erase(struct latch_tree_node *ltn, struct latch_tree_root *ltr, int idx)
 	rb_erase(&ltn->node[idx], &ltr->tree[idx]);
 }
 
+/*
+ * Beware when rbtree is being searched in RCU read sites.
+ *
+ * lib/rbtree.c has ensured that there is not possible to
+ * inadvertently cause (temporary) loops in the tree structure
+ * as seen in program order of the modifier. But loop is still
+ * possible to be seen in searcher due to CPU's reordering.
+ *
+ * for example:
+ * modifier				   searcher
+ *
+ * left rotate at parent		   search to parent
+ * parent->rb_right is node		   parent->rb_right is node
+ *					+->see node->rb_left changed
+ * WRITE_ONCE(parent->rb_right, tmp);-+ |  node->rb_left is parennt
+ * no smp_wmb(), some ARCHs can       | |
+ * reorder these two writes           | |  loop long between
+ * WRITE_ONCE(node->rb_left, parent);-+-+  parent and node
+ *				      |
+ *				      +--->finally see
+ *					   parent->rb_right
+ *
+ * The long loop won't stop until the searcher finally see the changes
+ * from the modifier. Too avoid it, we should limit the searching depth.
+ *
+ * Limited by the address space of the kernel, there are no more than
+ * (1<<BITS_PER_LONG)-1 nodes in the tree. And the max_depth of a tree is
+ * no more than 2*lg(node_count+1), which is no mare than 2*BITS_PER_LONG.
+ *
+ * So the searching should stop when diving down up to
+ * 2*BITS_PER_LONG depth.
+ *
+ * Note: the above problem is not subject to the TSO memory model, such as
+ * x86, which can dispense with the depth check.
+ */
 static __always_inline struct latch_tree_node *
 __lt_find(void *key, struct latch_tree_root *ltr, int idx,
 	  int (*comp)(void *key, struct latch_tree_node *node))
 {
 	struct rb_node *node = rcu_dereference_raw(ltr->tree[idx].rb_node);
+	int depth = 2 * BITS_PER_LONG;
 	struct latch_tree_node *ltn;
 	int c;
 
@@ -120,6 +156,9 @@ __lt_find(void *key, struct latch_tree_root *ltr, int idx,
 			node = rcu_dereference_raw(node->rb_right);
 		else
 			return ltn;
+
+		if (!IS_ENABLED(CONFIG_X86) && (--depth < 0))
+			break;
 	}
 
 	return NULL;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 2/2] rbtree_latch: don't need to check seq when it found a node
  2020-05-15 15:59       ` [PATCH V2 " Lai Jiangshan
@ 2020-05-15 15:59         ` Lai Jiangshan
  2020-05-16  4:27           ` Michel Lespinasse
  2020-05-23  0:56         ` [PATCH V2 1/2] rbtree_latch: quit searching when reaching to maximum depth Lai Jiangshan
  1 sibling, 1 reply; 13+ messages in thread
From: Lai Jiangshan @ 2020-05-15 15:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paul E . McKenney, Oleg Nesterov,
	Michel Lespinasse, Andrea Arcangeli, Rik van Riel,
	Mathieu Desnoyers, Peter Zijlstra

latch_tree_find() should be protected by caller via RCU or so.
When it find a node in an attempt, the node must be a valid one
in RCU's point's of view even the tree is (being) updated with a
new node with the same key which is entirely subject to timing
anyway.

Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 include/linux/rbtree_latch.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
index 638942f53c0a..affc4b026d9b 100644
--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -245,7 +245,7 @@ latch_tree_find(void *key, struct latch_tree_root *root,
 	do {
 		seq = raw_read_seqcount_latch(&root->seq);
 		node = __lt_find(key, root, seq & 1, ops->comp);
-	} while (read_seqcount_retry(&root->seq, seq));
+	} while (!node && read_seqcount_retry(&root->seq, seq));
 
 	return node;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 2/2] rbtree_latch: don't need to check seq when it found a node
  2020-05-15 15:59         ` [PATCH V2 2/2] rbtree_latch: don't need to check seq when it found a node Lai Jiangshan
@ 2020-05-16  4:27           ` Michel Lespinasse
  2020-05-16  4:52             ` Lai Jiangshan
  0 siblings, 1 reply; 13+ messages in thread
From: Michel Lespinasse @ 2020-05-16  4:27 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Paul E . McKenney, Oleg Nesterov, Andrea Arcangeli,
	Rik van Riel, Mathieu Desnoyers, Peter Zijlstra

On Fri, May 15, 2020 at 03:59:09PM +0000, Lai Jiangshan wrote:
> latch_tree_find() should be protected by caller via RCU or so.
> When it find a node in an attempt, the node must be a valid one
> in RCU's point's of view even the tree is (being) updated with a
> new node with the same key which is entirely subject to timing
> anyway.

I'm not sure I buy this. Even if we get a valid node, is it the one we
were searching for ? I don't see how this could be guaranteed if the
read raced with a tree rebalancing.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 2/2] rbtree_latch: don't need to check seq when it found a node
  2020-05-16  4:27           ` Michel Lespinasse
@ 2020-05-16  4:52             ` Lai Jiangshan
  2020-05-16  5:03               ` Michel Lespinasse
  0 siblings, 1 reply; 13+ messages in thread
From: Lai Jiangshan @ 2020-05-16  4:52 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Lai Jiangshan, LKML, Paul E . McKenney, Oleg Nesterov,
	Andrea Arcangeli, Rik van Riel, Mathieu Desnoyers,
	Peter Zijlstra

On Sat, May 16, 2020 at 12:28 PM Michel Lespinasse <walken@google.com> wrote:
>
> On Fri, May 15, 2020 at 03:59:09PM +0000, Lai Jiangshan wrote:
> > latch_tree_find() should be protected by caller via RCU or so.
> > When it find a node in an attempt, the node must be a valid one
> > in RCU's point's of view even the tree is (being) updated with a
> > new node with the same key which is entirely subject to timing
> > anyway.
>
> I'm not sure I buy this. Even if we get a valid node, is it the one we
> were searching for ? I don't see how this could be guaranteed if the
> read raced with a tree rebalancing.

It is valid because ops->comp() returns 0 and it should be
the one we were searching for unless ops->comp() is wrong.
The searched one could be possible just deleted, but it is still
a legitimate searched result in RCU's point's of view.

A tree rebalancing can cause a searching fails to find
an existing target. This is the job of read_seqcount_retry()
to tell you to retry.

>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 2/2] rbtree_latch: don't need to check seq when it found a node
  2020-05-16  4:52             ` Lai Jiangshan
@ 2020-05-16  5:03               ` Michel Lespinasse
  0 siblings, 0 replies; 13+ messages in thread
From: Michel Lespinasse @ 2020-05-16  5:03 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, LKML, Paul E . McKenney, Oleg Nesterov,
	Andrea Arcangeli, Rik van Riel, Mathieu Desnoyers,
	Peter Zijlstra

On Fri, May 15, 2020 at 9:52 PM Lai Jiangshan
<jiangshanlai+lkml@gmail.com> wrote:
>
> On Sat, May 16, 2020 at 12:28 PM Michel Lespinasse <walken@google.com> wrote:
> >
> > On Fri, May 15, 2020 at 03:59:09PM +0000, Lai Jiangshan wrote:
> > > latch_tree_find() should be protected by caller via RCU or so.
> > > When it find a node in an attempt, the node must be a valid one
> > > in RCU's point's of view even the tree is (being) updated with a
> > > new node with the same key which is entirely subject to timing
> > > anyway.
> >
> > I'm not sure I buy this. Even if we get a valid node, is it the one we
> > were searching for ? I don't see how this could be guaranteed if the
> > read raced with a tree rebalancing.
>
> It is valid because ops->comp() returns 0 and it should be
> the one we were searching for unless ops->comp() is wrong.
> The searched one could be possible just deleted, but it is still
> a legitimate searched result in RCU's point's of view.
>
> A tree rebalancing can cause a searching fails to find
> an existing target. This is the job of read_seqcount_retry()
> to tell you to retry.

Ah, yes, this is correct. It wouldn't work if we wanted to return the
next higher key for example, but it does work for exact matches. Nice!

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 1/2] rbtree_latch: quit searching when reaching to maximum depth
  2020-05-15 15:59       ` [PATCH V2 " Lai Jiangshan
  2020-05-15 15:59         ` [PATCH V2 2/2] rbtree_latch: don't need to check seq when it found a node Lai Jiangshan
@ 2020-05-23  0:56         ` Lai Jiangshan
  1 sibling, 0 replies; 13+ messages in thread
From: Lai Jiangshan @ 2020-05-23  0:56 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: LKML, Peter Zijlstra, Paul E . McKenney, Oleg Nesterov,
	Michel Lespinasse, Andrea Arcangeli, Rik van Riel,
	Mathieu Desnoyers

On Sat, May 16, 2020 at 12:01 AM Lai Jiangshan <laijs@linux.alibaba.com> wrote:
>
> lib/rbtree.c has ensured that there is not possible to
> inadvertently cause (temporary) loops in the tree structure
> as seen in program order of the modifier. But loop is still
> possible to be seen in searcher due to CPU's reordering.
>
> for example:
> modifier                                searcher
>
> left rotate at parent
> parent->rb_right is node
>                                         search to parent
>                                         parent->rb_right is node
>                                      +->see node->rb_left changed
> WRITE_ONCE(parent->rb_right, tmp);-+ |  node->rb_left is parennt
> no smp_wmb(), some ARCHs can       | |
> reorder these two writes           | |  loop long between
> WRITE_ONCE(node->rb_left, parent);-+-+  parent and node
>                                    |
>                                    +--->finally see
>                                         parent->rb_right
>
> The long loop won't stop until the modifer's CPU flushes
> its writes. Too avoid it, we should limit the searching depth.
> There are no more than (1<<BITS_PER_LONG)-1 nodes in the tree.
> And the max_depth of a tree is no more than 2*lg(node_count+1),
> which is no mare than 2*BITS_PER_LONG.
>
> So the searching should stop when diving down up to
> 2*BITS_PER_LONG depth.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---

Hello

Could anyone have a review or an ack on the updated patch?
Compared v1:
  Applied Mathieu's suggest to change the changelog
  Avoid the depth check on x86, so the patch makes no functionality
    change on x86

And which tree should the patches route to? It is memory ordering
related.

Thanks
Lai

>  include/linux/rbtree_latch.h | 39 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
> diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
> index 7d012faa509a..638942f53c0a 100644
> --- a/include/linux/rbtree_latch.h
> +++ b/include/linux/rbtree_latch.h
> @@ -102,11 +102,47 @@ __lt_erase(struct latch_tree_node *ltn, struct latch_tree_root *ltr, int idx)
>         rb_erase(&ltn->node[idx], &ltr->tree[idx]);
>  }
>
> +/*
> + * Beware when rbtree is being searched in RCU read sites.
> + *
> + * lib/rbtree.c has ensured that there is not possible to
> + * inadvertently cause (temporary) loops in the tree structure
> + * as seen in program order of the modifier. But loop is still
> + * possible to be seen in searcher due to CPU's reordering.
> + *
> + * for example:
> + * modifier                               searcher
> + *
> + * left rotate at parent                  search to parent
> + * parent->rb_right is node               parent->rb_right is node
> + *                                     +->see node->rb_left changed
> + * WRITE_ONCE(parent->rb_right, tmp);-+ |  node->rb_left is parennt
> + * no smp_wmb(), some ARCHs can       | |
> + * reorder these two writes           | |  loop long between
> + * WRITE_ONCE(node->rb_left, parent);-+-+  parent and node
> + *                                   |
> + *                                   +--->finally see
> + *                                        parent->rb_right
> + *
> + * The long loop won't stop until the searcher finally see the changes
> + * from the modifier. Too avoid it, we should limit the searching depth.
> + *
> + * Limited by the address space of the kernel, there are no more than
> + * (1<<BITS_PER_LONG)-1 nodes in the tree. And the max_depth of a tree is
> + * no more than 2*lg(node_count+1), which is no mare than 2*BITS_PER_LONG.
> + *
> + * So the searching should stop when diving down up to
> + * 2*BITS_PER_LONG depth.
> + *
> + * Note: the above problem is not subject to the TSO memory model, such as
> + * x86, which can dispense with the depth check.
> + */
>  static __always_inline struct latch_tree_node *
>  __lt_find(void *key, struct latch_tree_root *ltr, int idx,
>           int (*comp)(void *key, struct latch_tree_node *node))
>  {
>         struct rb_node *node = rcu_dereference_raw(ltr->tree[idx].rb_node);
> +       int depth = 2 * BITS_PER_LONG;
>         struct latch_tree_node *ltn;
>         int c;
>
> @@ -120,6 +156,9 @@ __lt_find(void *key, struct latch_tree_root *ltr, int idx,
>                         node = rcu_dereference_raw(node->rb_right);
>                 else
>                         return ltn;
> +
> +               if (!IS_ENABLED(CONFIG_X86) && (--depth < 0))
> +                       break;
>         }
>
>         return NULL;
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-05-23  0:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-15 12:47 [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth Lai Jiangshan
2020-05-15 12:47 ` [PATCH 2/2] rbtree_latch: don't need to check seq when it found a node Lai Jiangshan
2020-05-15 13:04   ` Peter Zijlstra
2020-05-15 13:00 ` [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth Peter Zijlstra
2020-05-15 14:39   ` Lai Jiangshan
2020-05-15 15:01     ` Peter Zijlstra
2020-05-15 15:59       ` [PATCH V2 " Lai Jiangshan
2020-05-15 15:59         ` [PATCH V2 2/2] rbtree_latch: don't need to check seq when it found a node Lai Jiangshan
2020-05-16  4:27           ` Michel Lespinasse
2020-05-16  4:52             ` Lai Jiangshan
2020-05-16  5:03               ` Michel Lespinasse
2020-05-23  0:56         ` [PATCH V2 1/2] rbtree_latch: quit searching when reaching to maximum depth Lai Jiangshan
2020-05-15 13:14 ` [PATCH " Mathieu Desnoyers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.