linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/9] latched RB-trees and __module_address()
@ 2015-02-28 21:24 Peter Zijlstra
  2015-02-28 21:24 ` [RFC][PATCH 1/9] klp: Fix obvious RCU fail Peter Zijlstra
                   ` (8 more replies)
  0 siblings, 9 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-02-28 21:24 UTC (permalink / raw)
  To: mingo, rusty, mathieu.desnoyers, oleg, paulmck
  Cc: linux-kernel, andi, rostedt, tglx, peterz

This series is aimed at making __module_address() go fast(er).

On the way there it:
 - annotates and sanitizes module locking
 - introduces the latched RB-tree
 - employs it to make __module_address() go fast

I've build and boot tested this on x86_64 with modules and lockdep enabled.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 1/9] klp: Fix obvious RCU fail
  2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
@ 2015-02-28 21:24 ` Peter Zijlstra
  2015-03-01 20:09   ` Jiri Kosina
                     ` (3 more replies)
  2015-02-28 21:24 ` [RFC][PATCH 2/9] module: Sanitize RCU usage and locking Peter Zijlstra
                   ` (7 subsequent siblings)
  8 siblings, 4 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-02-28 21:24 UTC (permalink / raw)
  To: mingo, rusty, mathieu.desnoyers, oleg, paulmck
  Cc: linux-kernel, andi, rostedt, tglx, peterz, Seth Jennings,
	Josh Poimboeuf, Masami Hiramatsu, Miroslav Benes, Petr Mladek,
	Jiri Kosina

[-- Attachment #1: peterz-klp-fix-rcu-fail.patch --]
[-- Type: text/plain, Size: 1188 bytes --]

While one must hold RCU-sched (aka. preempt_disable) for find_symbol()
one must equally hold it over the use of the object returned.

The moment you release the RCU-sched read lock, the object can be dead
and gone.

Cc: Seth Jennings <sjenning@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/livepatch/core.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -248,11 +248,12 @@ static int klp_find_external_symbol(stru
 	/* first, check if it's an exported symbol */
 	preempt_disable();
 	sym = find_symbol(name, NULL, NULL, true, true);
-	preempt_enable();
 	if (sym) {
 		*addr = sym->value;
+		preempt_enable();
 		return 0;
 	}
+	preempt_enable();
 
 	/* otherwise check if it's in another .o within the patch module */
 	return klp_find_object_symbol(pmod->name, name, addr);



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 2/9] module: Sanitize RCU usage and locking
  2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
  2015-02-28 21:24 ` [RFC][PATCH 1/9] klp: Fix obvious RCU fail Peter Zijlstra
@ 2015-02-28 21:24 ` Peter Zijlstra
  2015-03-02 11:16   ` Rusty Russell
  2015-03-02 19:37   ` Paul E. McKenney
  2015-02-28 21:24 ` [RFC][PATCH 3/9] module: Annotate module version magic Peter Zijlstra
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-02-28 21:24 UTC (permalink / raw)
  To: mingo, rusty, mathieu.desnoyers, oleg, paulmck
  Cc: linux-kernel, andi, rostedt, tglx, peterz

[-- Attachment #1: peterz-module-sanitize-rcu.patch --]
[-- Type: text/plain, Size: 5463 bytes --]

Currently the RCU usage in module is an inconsistent mess of RCU and
RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
does not imply synchronize_sched().

Convert everything over to RCU-sched.

Furthermore add lockdep asserts to all sites, because its not at all
clear to me the required locking is observed, esp. on exported
functions.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/module.h |   12 ++++++++++--
 kernel/module.c        |   42 ++++++++++++++++++++++++++++++++++--------
 lib/bug.c              |    7 +++++--
 3 files changed, 49 insertions(+), 12 deletions(-)

--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -415,14 +415,22 @@ struct symsearch {
 	bool unused;
 };
 
-/* Search for an exported symbol by name. */
+/*
+ * Search for an exported symbol by name.
+ *
+ * Must be called with module_mutex held or preemption disabled.
+ */
 const struct kernel_symbol *find_symbol(const char *name,
 					struct module **owner,
 					const unsigned long **crc,
 					bool gplok,
 					bool warn);
 
-/* Walk the exported symbol table */
+/*
+ * Walk the exported symbol table
+ *
+ * Must be called with module_mutex held or preemption disabled.
+ */
 bool each_symbol_section(bool (*fn)(const struct symsearch *arr,
 				    struct module *owner,
 				    void *data), void *data);
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -106,6 +106,24 @@ static LIST_HEAD(modules);
 struct list_head *kdb_modules = &modules; /* kdb needs the list of modules */
 #endif /* CONFIG_KGDB_KDB */
 
+static inline void module_assert_mutex(void)
+{
+	lockdep_assert_held(&module_mutex);
+}
+
+static inline void module_assert_mutex_or_preempt(void)
+{
+#ifdef CONFIG_LOCKDEP
+	int rcu_held = rcu_read_lock_sched_held();
+	int mutex_held = 1;
+
+	if (debug_locks)
+		mutex_held = lockdep_is_held(&module_mutex);
+
+	WARN_ON(!rcu_held && !mutex_held);
+#endif
+}
+
 #ifdef CONFIG_MODULE_SIG
 #ifdef CONFIG_MODULE_SIG_FORCE
 static bool sig_enforce = true;
@@ -319,6 +337,8 @@ bool each_symbol_section(bool (*fn)(cons
 #endif
 	};
 
+	module_assert_mutex_or_preempt();
+
 	if (each_symbol_in_section(arr, ARRAY_SIZE(arr), NULL, fn, data))
 		return true;
 
@@ -458,6 +478,8 @@ static struct module *find_module_all(co
 {
 	struct module *mod;
 
+	module_assert_mutex();
+
 	list_for_each_entry(mod, &modules, list) {
 		if (!even_unformed && mod->state == MODULE_STATE_UNFORMED)
 			continue;
@@ -1856,8 +1878,8 @@ static void free_module(struct module *m
 	list_del_rcu(&mod->list);
 	/* Remove this module from bug list, this uses list_del_rcu */
 	module_bug_cleanup(mod);
-	/* Wait for RCU synchronizing before releasing mod->list and buglist. */
-	synchronize_rcu();
+	/* Wait for RCU-sched synchronizing before releasing mod->list and buglist. */
+	synchronize_sched();
 	mutex_unlock(&module_mutex);
 
 	/* This may be NULL, but that's OK */
@@ -3106,11 +3128,11 @@ static noinline int do_init_module(struc
 	mod->init_text_size = 0;
 	/*
 	 * We want to free module_init, but be aware that kallsyms may be
-	 * walking this with preempt disabled.  In all the failure paths,
-	 * we call synchronize_rcu/synchronize_sched, but we don't want
-	 * to slow down the success path, so use actual RCU here.
+	 * walking this with preempt disabled.  In all the failure paths, we
+	 * call synchronize_sched, but we don't want to slow down the success
+	 * path, so use actual RCU here.
 	 */
-	call_rcu(&freeinit->rcu, do_free_init);
+	call_rcu_sched(&freeinit->rcu, do_free_init);
 	mutex_unlock(&module_mutex);
 	wake_up_all(&module_wq);
 
@@ -3368,8 +3390,8 @@ static int load_module(struct load_info
 	/* Unlink carefully: kallsyms could be walking list. */
 	list_del_rcu(&mod->list);
 	wake_up_all(&module_wq);
-	/* Wait for RCU synchronizing before releasing mod->list. */
-	synchronize_rcu();
+	/* Wait for RCU-sched synchronizing before releasing mod->list. */
+	synchronize_sched();
 	mutex_unlock(&module_mutex);
  free_module:
 	/* Free lock-classes; relies on the preceding sync_rcu() */
@@ -3636,6 +3658,8 @@ int module_kallsyms_on_each_symbol(int (
 	unsigned int i;
 	int ret;
 
+	module_assert_mutex();
+
 	list_for_each_entry(mod, &modules, list) {
 		if (mod->state == MODULE_STATE_UNFORMED)
 			continue;
@@ -3810,6 +3834,8 @@ struct module *__module_address(unsigned
 	if (addr < module_addr_min || addr > module_addr_max)
 		return NULL;
 
+	module_assert_mutex_or_preempt();
+
 	list_for_each_entry_rcu(mod, &modules, list) {
 		if (mod->state == MODULE_STATE_UNFORMED)
 			continue;
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -66,7 +66,7 @@ static const struct bug_entry *module_fi
 	struct module *mod;
 	const struct bug_entry *bug = NULL;
 
-	rcu_read_lock();
+	rcu_read_lock_sched();
 	list_for_each_entry_rcu(mod, &module_bug_list, bug_list) {
 		unsigned i;
 
@@ -77,7 +77,7 @@ static const struct bug_entry *module_fi
 	}
 	bug = NULL;
 out:
-	rcu_read_unlock();
+	rcu_read_unlock_sched();
 
 	return bug;
 }
@@ -88,6 +88,8 @@ void module_bug_finalize(const Elf_Ehdr
 	char *secstrings;
 	unsigned int i;
 
+	lockdep_assert_held(&module_mutex);
+
 	mod->bug_table = NULL;
 	mod->num_bugs = 0;
 
@@ -113,6 +115,7 @@ void module_bug_finalize(const Elf_Ehdr
 
 void module_bug_cleanup(struct module *mod)
 {
+	lockdep_assert_held(&module_mutex);
 	list_del_rcu(&mod->bug_list);
 }
 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 3/9] module: Annotate module version magic
  2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
  2015-02-28 21:24 ` [RFC][PATCH 1/9] klp: Fix obvious RCU fail Peter Zijlstra
  2015-02-28 21:24 ` [RFC][PATCH 2/9] module: Sanitize RCU usage and locking Peter Zijlstra
@ 2015-02-28 21:24 ` Peter Zijlstra
  2015-03-02 19:38   ` Paul E. McKenney
  2015-02-28 21:24 ` [RFC][PATCH 4/9] module, jump_label: Fix module locking Peter Zijlstra
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-02-28 21:24 UTC (permalink / raw)
  To: mingo, rusty, mathieu.desnoyers, oleg, paulmck
  Cc: linux-kernel, andi, rostedt, tglx, peterz

[-- Attachment #1: peterz-module-fix-load_module.patch --]
[-- Type: text/plain, Size: 2707 bytes --]

Due to the new lockdep checks we go:

[    9.759380] ------------[ cut here ]------------
[    9.759389] WARNING: CPU: 31 PID: 597 at ../kernel/module.c:216 each_symbol_section+0x121/0x130()
[    9.759391] Modules linked in:
[    9.759393] CPU: 31 PID: 597 Comm: modprobe Not tainted 4.0.0-rc1+ #65
[    9.759393] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[    9.759396]  ffffffff817d8676 ffff880424567ca8 ffffffff8157e98b 0000000000000001
[    9.759398]  0000000000000000 ffff880424567ce8 ffffffff8105fbc7 ffff880424567cd8
[    9.759400]  0000000000000000 ffffffff810ec160 ffff880424567d40 0000000000000000
[    9.759400] Call Trace:
[    9.759407]  [<ffffffff8157e98b>] dump_stack+0x4f/0x7b
[    9.759410]  [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0
[    9.759412]  [<ffffffff810ec160>] ? section_objs+0x60/0x60
[    9.759414]  [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20
[    9.759415]  [<ffffffff810ed9c1>] each_symbol_section+0x121/0x130
[    9.759417]  [<ffffffff810eda01>] find_symbol+0x31/0x70
[    9.759420]  [<ffffffff810ef5bf>] load_module+0x20f/0x2660
[    9.759422]  [<ffffffff8104ef10>] ? __do_page_fault+0x190/0x4e0
[    9.759426]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
[    9.759427]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
[    9.759433]  [<ffffffff810ae73d>] ? trace_hardirqs_on_caller+0x11d/0x1e0
[    9.759437]  [<ffffffff812fcc0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[    9.759439]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
[    9.759441]  [<ffffffff810f1ade>] SyS_init_module+0xce/0x100
[    9.759443]  [<ffffffff81587429>] system_call_fastpath+0x12/0x17
[    9.759445] ---[ end trace 9294429076a9c644 ]---

As per the comment this site should be fine, but lets wrap it in
preempt_disable() anyhow to placate lockdep.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/module.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1192,11 +1192,17 @@ static inline int check_modstruct_versio
 {
 	const unsigned long *crc;
 
-	/* Since this should be found in kernel (which can't be removed),
-	 * no locking is necessary. */
+	/*
+	 * Since this should be found in kernel (which can't be removed),
+	 * no locking is necessary.
+	 */
+	preempt_disable();
 	if (!find_symbol(VMLINUX_SYMBOL_STR(module_layout), NULL,
-			 &crc, true, false))
+			 &crc, true, false)) {
+		preempt_enable();
 		BUG();
+	}
+	preempt_enable();
 	return check_version(sechdrs, versindex,
 			     VMLINUX_SYMBOL_STR(module_layout), mod, crc,
 			     NULL);



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 4/9] module, jump_label: Fix module locking
  2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
                   ` (2 preceding siblings ...)
  2015-02-28 21:24 ` [RFC][PATCH 3/9] module: Annotate module version magic Peter Zijlstra
@ 2015-02-28 21:24 ` Peter Zijlstra
  2015-03-02 19:39   ` Paul E. McKenney
  2015-02-28 21:24 ` [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal Peter Zijlstra
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-02-28 21:24 UTC (permalink / raw)
  To: mingo, rusty, mathieu.desnoyers, oleg, paulmck
  Cc: linux-kernel, andi, rostedt, tglx, peterz, Jason Baron

[-- Attachment #1: peterz-module-fix-jump_label.patch --]
[-- Type: text/plain, Size: 3071 bytes --]

As per the module core lockdep annotations:

[   18.034047] ---[ end trace 9294429076a9c673 ]---
[   18.047760] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[   18.059228]  ffffffff817d8676 ffff880036683c38 ffffffff8157e98b 0000000000000001
[   18.067541]  0000000000000000 ffff880036683c78 ffffffff8105fbc7 ffff880036683c68
[   18.075851]  ffffffffa0046b08 0000000000000000 ffffffffa0046d00 ffffffffa0046cc8
[   18.084173] Call Trace:
[   18.086906]  [<ffffffff8157e98b>] dump_stack+0x4f/0x7b
[   18.092649]  [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0
[   18.099361]  [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20
[   18.105880]  [<ffffffff810ee502>] __module_address+0x1d2/0x1e0
[   18.112400]  [<ffffffff81161153>] jump_label_module_notify+0x143/0x1e0
[   18.119710]  [<ffffffff810814bf>] notifier_call_chain+0x4f/0x70
[   18.126326]  [<ffffffff8108160e>] __blocking_notifier_call_chain+0x5e/0x90
[   18.134009]  [<ffffffff81081656>] blocking_notifier_call_chain+0x16/0x20
[   18.141490]  [<ffffffff810f0f00>] load_module+0x1b50/0x2660
[   18.147720]  [<ffffffff810f1ade>] SyS_init_module+0xce/0x100
[   18.154045]  [<ffffffff81587429>] system_call_fastpath+0x12/0x17
[   18.160748] ---[ end trace 9294429076a9c674 ]---

Jump labels is not doing it right; fix this.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/jump_label.c |   21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -280,6 +280,17 @@ void jump_label_apply_nops(struct module
 	}
 }
 
+static inline bool address_in_module(unsigned long addr, struct module *mod)
+{
+	bool ret;
+
+	preempt_disable();
+	ret = __module_address(addr) == mod;
+	preempt_enable();
+
+	return ret;
+}
+
 static int jump_label_add_module(struct module *mod)
 {
 	struct jump_entry *iter_start = mod->jump_entries;
@@ -302,7 +313,7 @@ static int jump_label_add_module(struct
 			continue;
 
 		key = iterk;
-		if (__module_address(iter->key) == mod) {
+		if (address_in_module(iter->key, mod)) {
 			/*
 			 * Set key->entries to iter, but preserve JUMP_LABEL_TRUE_BRANCH.
 			 */
@@ -339,7 +350,7 @@ static void jump_label_del_module(struct
 
 		key = (struct static_key *)(unsigned long)iter->key;
 
-		if (__module_address(iter->key) == mod)
+		if (address_in_module(iter->key, mod))
 			continue;
 
 		prev = &key->next;
@@ -443,14 +454,16 @@ static void jump_label_update(struct sta
 {
 	struct jump_entry *stop = __stop___jump_table;
 	struct jump_entry *entry = jump_label_get_entries(key);
-
 #ifdef CONFIG_MODULES
-	struct module *mod = __module_address((unsigned long)key);
+	struct module *mod;
 
 	__jump_label_mod_update(key, enable);
 
+	preempt_disable();
+	mod = __module_address((unsigned long)key);
 	if (mod)
 		stop = mod->jump_entries + mod->num_jump_entries;
+	preempt_enable();
 #endif
 	/* if there are no users, entry can be NULL */
 	if (entry)



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
  2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
                   ` (3 preceding siblings ...)
  2015-02-28 21:24 ` [RFC][PATCH 4/9] module, jump_label: Fix module locking Peter Zijlstra
@ 2015-02-28 21:24 ` Peter Zijlstra
  2015-03-01 13:52   ` Mathieu Desnoyers
  2015-03-01 21:11   ` Michel Lespinasse
  2015-02-28 21:24 ` [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch() Peter Zijlstra
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-02-28 21:24 UTC (permalink / raw)
  To: mingo, rusty, mathieu.desnoyers, oleg, paulmck
  Cc: linux-kernel, andi, rostedt, tglx, peterz, Michel Lespinasse,
	Andrea Arcangeli, David Woodhouse, Rik van Riel

[-- Attachment #1: peterz-rbtree-lockless.patch --]
[-- Type: text/plain, Size: 9922 bytes --]

Change the insert and erase code such that lockless searches are
non-fatal.

In and of itself an rbtree cannot be correctly searched while
in-modification, we can however provide weaker guarantees that will
allow the rbtree to be used in conjunction with other techniques, such
as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").

For this to work we need the following guarantees from the rbtree
code:

 1) a lockless reader must not see partial stores, this would allow it
    to observe nodes that are invalid memory.

 2) there must not be (temporary) loops in the tree structure in the
    modifier's program order, this would cause a lookup which
    interrupts the modifier to get stuck indefinitely.

For 1) we must use WRITE_ONCE() for all updates to the tree structure;
in particular this patch only does rb_{left,right} as those are the
only element required for simple searches.

It generates slightly worse code, probably because gcc stinks at
volatile. But in pointer chasing heavy code a few instructions more
should not matter.

For 2) I have carefully audited the code and drawn every intermediate
link state and not found a loop.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/rbtree.h           |   10 +++++
 include/linux/rbtree_augmented.h |   21 +++++++----
 lib/rbtree.c                     |   71 ++++++++++++++++++++++++++-------------
 3 files changed, 73 insertions(+), 29 deletions(-)

--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -31,6 +31,7 @@
 
 #include <linux/kernel.h>
 #include <linux/stddef.h>
+#include <linux/rcupdate.h>
 
 struct rb_node {
 	unsigned long  __rb_parent_color;
@@ -85,6 +86,15 @@ static inline void rb_link_node(struct r
 	*rb_link = node;
 }
 
+static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node * parent,
+				    struct rb_node ** rb_link)
+{
+	node->__rb_parent_color = (unsigned long)parent;
+	node->rb_left = node->rb_right = NULL;
+
+	rcu_assign_pointer(*rb_link, node);
+}
+
 #define rb_entry_safe(ptr, type, member) \
 	({ typeof(ptr) ____ptr = (ptr); \
 	   ____ptr ? rb_entry(____ptr, type, member) : NULL; \
--- a/include/linux/rbtree_augmented.h
+++ b/include/linux/rbtree_augmented.h
@@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s
 {
 	if (parent) {
 		if (parent->rb_left == old)
-			parent->rb_left = new;
+			WRITE_ONCE(parent->rb_left, new);
 		else
-			parent->rb_right = new;
+			WRITE_ONCE(parent->rb_right, new);
 	} else
-		root->rb_node = new;
+		WRITE_ONCE(root->rb_node, new);
 }
 
 extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
@@ -137,7 +137,8 @@ static __always_inline struct rb_node *
 __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 		     const struct rb_augment_callbacks *augment)
 {
-	struct rb_node *child = node->rb_right, *tmp = node->rb_left;
+	struct rb_node *child = node->rb_right;
+	struct rb_node *tmp = node->rb_left;
 	struct rb_node *parent, *rebalance;
 	unsigned long pc;
 
@@ -167,6 +168,7 @@ __rb_erase_augmented(struct rb_node *nod
 		tmp = parent;
 	} else {
 		struct rb_node *successor = child, *child2;
+
 		tmp = child->rb_left;
 		if (!tmp) {
 			/*
@@ -180,6 +182,7 @@ __rb_erase_augmented(struct rb_node *nod
 			 */
 			parent = successor;
 			child2 = successor->rb_right;
+
 			augment->copy(node, successor);
 		} else {
 			/*
@@ -201,19 +204,23 @@ __rb_erase_augmented(struct rb_node *nod
 				successor = tmp;
 				tmp = tmp->rb_left;
 			} while (tmp);
-			parent->rb_left = child2 = successor->rb_right;
-			successor->rb_right = child;
+			child2 = successor->rb_right;
+			WRITE_ONCE(parent->rb_left, child2);
+			WRITE_ONCE(successor->rb_right, child);
 			rb_set_parent(child, successor);
+
 			augment->copy(node, successor);
 			augment->propagate(parent, successor);
 		}
 
-		successor->rb_left = tmp = node->rb_left;
+		tmp = node->rb_left;
+		WRITE_ONCE(successor->rb_left, tmp);
 		rb_set_parent(tmp, successor);
 
 		pc = node->__rb_parent_color;
 		tmp = __rb_parent(pc);
 		__rb_change_child(node, successor, tmp, root);
+
 		if (child2) {
 			successor->__rb_parent_color = pc;
 			rb_set_parent_color(child2, parent, RB_BLACK);
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -44,6 +44,25 @@
  *  parentheses and have some accompanying text comment.
  */
 
+/*
+ * All stores to the tree structure (rb_left and rb_right) must be done using
+ * WRITE_ONCE(). And we must not inadvertently cause (temporary) loops in the
+ * tree structure as seen in program order.
+ *
+ * These two requirements will allow lockless iteration of the tree -- not
+ * correct iteration mind you, tree rotations are not atomic so a lookup might
+ * miss entire subtrees.
+ *
+ * But they do guarantee that any such traversal will only see valid elements
+ * and that it will indeed complete -- does not get stuck in a loop.
+ *
+ * NOTE:
+ *
+ * Stores to __rb_parent_color are not important for simple lookups so those
+ * are left undone as of now. Nor did I check for loops involving parent
+ * pointers.
+ */
+
 static inline void rb_set_black(struct rb_node *rb)
 {
 	rb->__rb_parent_color |= RB_BLACK;
@@ -129,8 +148,9 @@ __rb_insert(struct rb_node *node, struct
 				 * This still leaves us in violation of 4), the
 				 * continuation into Case 3 will fix that.
 				 */
-				parent->rb_right = tmp = node->rb_left;
-				node->rb_left = parent;
+				tmp = node->rb_left;
+				WRITE_ONCE(parent->rb_right, tmp);
+				WRITE_ONCE(node->rb_left, parent);
 				if (tmp)
 					rb_set_parent_color(tmp, parent,
 							    RB_BLACK);
@@ -149,8 +169,8 @@ __rb_insert(struct rb_node *node, struct
 			 *     /                 \
 			 *    n                   U
 			 */
-			gparent->rb_left = tmp;  /* == parent->rb_right */
-			parent->rb_right = gparent;
+			WRITE_ONCE(gparent->rb_left, tmp); /* == parent->rb_right */
+			WRITE_ONCE(parent->rb_right, gparent);
 			if (tmp)
 				rb_set_parent_color(tmp, gparent, RB_BLACK);
 			__rb_rotate_set_parents(gparent, parent, root, RB_RED);
@@ -171,8 +191,9 @@ __rb_insert(struct rb_node *node, struct
 			tmp = parent->rb_left;
 			if (node == tmp) {
 				/* Case 2 - right rotate at parent */
-				parent->rb_left = tmp = node->rb_right;
-				node->rb_right = parent;
+				tmp = node->rb_right;
+				WRITE_ONCE(parent->rb_left, tmp);
+				WRITE_ONCE(node->rb_right, parent);
 				if (tmp)
 					rb_set_parent_color(tmp, parent,
 							    RB_BLACK);
@@ -183,8 +204,8 @@ __rb_insert(struct rb_node *node, struct
 			}
 
 			/* Case 3 - left rotate at gparent */
-			gparent->rb_right = tmp;  /* == parent->rb_left */
-			parent->rb_left = gparent;
+			WRITE_ONCE(gparent->rb_right, tmp); /* == parent->rb_left */
+			WRITE_ONCE(parent->rb_left, gparent);
 			if (tmp)
 				rb_set_parent_color(tmp, gparent, RB_BLACK);
 			__rb_rotate_set_parents(gparent, parent, root, RB_RED);
@@ -224,8 +245,9 @@ ____rb_erase_color(struct rb_node *paren
 				 *      / \         / \
 				 *     Sl  Sr      N   Sl
 				 */
-				parent->rb_right = tmp1 = sibling->rb_left;
-				sibling->rb_left = parent;
+				tmp1 = sibling->rb_left;
+				WRITE_ONCE(parent->rb_right, tmp1);
+				WRITE_ONCE(sibling->rb_left, parent);
 				rb_set_parent_color(tmp1, parent, RB_BLACK);
 				__rb_rotate_set_parents(parent, sibling, root,
 							RB_RED);
@@ -275,9 +297,10 @@ ____rb_erase_color(struct rb_node *paren
 				 *                       \
 				 *                        Sr
 				 */
-				sibling->rb_left = tmp1 = tmp2->rb_right;
-				tmp2->rb_right = sibling;
-				parent->rb_right = tmp2;
+				tmp1 = tmp2->rb_right;
+				WRITE_ONCE(sibling->rb_left, tmp1);
+				WRITE_ONCE(tmp2->rb_right, sibling);
+				WRITE_ONCE(parent->rb_right, tmp2);
 				if (tmp1)
 					rb_set_parent_color(tmp1, sibling,
 							    RB_BLACK);
@@ -297,8 +320,9 @@ ____rb_erase_color(struct rb_node *paren
 			 *        / \         / \
 			 *      (sl) sr      N  (sl)
 			 */
-			parent->rb_right = tmp2 = sibling->rb_left;
-			sibling->rb_left = parent;
+			tmp2 = sibling->rb_left;
+			WRITE_ONCE(parent->rb_right, tmp2);
+			WRITE_ONCE(sibling->rb_left, parent);
 			rb_set_parent_color(tmp1, sibling, RB_BLACK);
 			if (tmp2)
 				rb_set_parent(tmp2, parent);
@@ -310,8 +334,9 @@ ____rb_erase_color(struct rb_node *paren
 			sibling = parent->rb_left;
 			if (rb_is_red(sibling)) {
 				/* Case 1 - right rotate at parent */
-				parent->rb_left = tmp1 = sibling->rb_right;
-				sibling->rb_right = parent;
+				tmp1 = sibling->rb_right;
+				WRITE_ONCE(parent->rb_left, tmp1);
+				WRITE_ONCE(sibling->rb_right, parent);
 				rb_set_parent_color(tmp1, parent, RB_BLACK);
 				__rb_rotate_set_parents(parent, sibling, root,
 							RB_RED);
@@ -336,9 +361,10 @@ ____rb_erase_color(struct rb_node *paren
 					break;
 				}
 				/* Case 3 - right rotate at sibling */
-				sibling->rb_right = tmp1 = tmp2->rb_left;
-				tmp2->rb_left = sibling;
-				parent->rb_left = tmp2;
+				tmp1 = tmp2->rb_left;
+				WRITE_ONCE(sibling->rb_right, tmp1);
+				WRITE_ONCE(tmp2->rb_left, sibling);
+				WRITE_ONCE(parent->rb_left, tmp2);
 				if (tmp1)
 					rb_set_parent_color(tmp1, sibling,
 							    RB_BLACK);
@@ -347,8 +373,9 @@ ____rb_erase_color(struct rb_node *paren
 				sibling = tmp2;
 			}
 			/* Case 4 - left rotate at parent + color flips */
-			parent->rb_left = tmp2 = sibling->rb_right;
-			sibling->rb_right = parent;
+			tmp2 = sibling->rb_right;
+			WRITE_ONCE(parent->rb_left, tmp2);
+			WRITE_ONCE(sibling->rb_right, parent);
 			rb_set_parent_color(tmp1, sibling, RB_BLACK);
 			if (tmp2)
 				rb_set_parent(tmp2, parent);



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch()
  2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
                   ` (4 preceding siblings ...)
  2015-02-28 21:24 ` [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal Peter Zijlstra
@ 2015-02-28 21:24 ` Peter Zijlstra
  2015-03-01 14:02   ` Mathieu Desnoyers
  2015-03-01 21:12   ` Michel Lespinasse
  2015-02-28 21:24 ` [RFC][PATCH 7/9] rbtree: Implement generic latch_tree Peter Zijlstra
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-02-28 21:24 UTC (permalink / raw)
  To: mingo, rusty, mathieu.desnoyers, oleg, paulmck
  Cc: linux-kernel, andi, rostedt, tglx, peterz, Michel Lespinasse,
	Andrea Arcangeli, David Woodhouse, Rik van Riel

[-- Attachment #1: peterz-latch-comment.patch --]
[-- Type: text/plain, Size: 4807 bytes --]

Improve the documentation of the latch technique as used in the
current timekeeping code, such that it can be readily employed
elsewhere.

Borrow from the comments in timekeeping and replace those with a
reference to this more generic comment.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/seqlock.h   |   74 +++++++++++++++++++++++++++++++++++++++++++++-
 kernel/time/timekeeping.c |   27 ----------------
 2 files changed, 74 insertions(+), 27 deletions(-)

--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -233,9 +233,81 @@ static inline void raw_write_seqcount_en
 	s->sequence++;
 }
 
-/*
+/**
  * raw_write_seqcount_latch - redirect readers to even/odd copy
  * @s: pointer to seqcount_t
+ *
+ * The latch technique is a multiversion concurrency control method that allows
+ * queries during non atomic modifications. If you can guarantee queries never
+ * interrupt the modification -- e.g. the concurrency is strictly between CPUs
+ * -- you most likely do not need this.
+ *
+ * Where the traditional RCU/lockless data structures rely on atomic
+ * modifications to ensure queries observe either the old or the new state the
+ * latch allows the same for non atomic updates. The trade-off is doubling the
+ * cost of storage; we have to maintain two copies of the entire data
+ * structure.
+ *
+ * Very simply put: we first modify one copy and then the other. This ensures
+ * there is always one copy in a stable state, ready to give us an answer.
+ *
+ * The basic form is a data structure like:
+ *
+ * struct latch_struct {
+ *	seqcount_t		seq;
+ *	struct data_struct	data[2];
+ * };
+ *
+ * Where a modification, which is assumed to be externally serialized, does the
+ * following:
+ *
+ * void latch_modify(struct latch_struct *latch, ...)
+ * {
+ *	smp_wmb();	<- Ensure that the last data[1] update is visible
+ *	latch->seq++;
+ *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *
+ *	modify(latch->data[0], ...);
+ *
+ *	smp_wmb();	<- Ensure that the data[0] update is visible
+ *	latch->seq++;
+ *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *
+ *	modify(latch->data[1], ...);
+ * }
+ *
+ * The query will have a form like:
+ *
+ * struct entry *latch_query(struct latch_struct *latch, ...)
+ * {
+ *	struct entry *entry;
+ *	unsigned seq;
+ *	int idx;
+ *
+ *	do {
+ *		seq = latch->seq;
+ *		smp_rmb();
+ *
+ *		idx = seq & 0x01;
+ *		entry = data_query(latch->data[idx], ...);
+ *
+ *		smp_rmb();
+ *	} while (seq != latch->seq);
+ *
+ *	return entry;
+ * }
+ *
+ * So during the modification, queries are first redirected to data[1]. Then we
+ * modify data[0]. When that is complete, we redirect queries back to data[0]
+ * and we can modify data[1].
+ *
+ * NOTE: The non-requirement for atomic modifications does _NOT_ include
+ *       the publishing of new entries in the case where data is a dynamic
+ *       data structure.
+ *
+ *       An iteration might start in data[0] and get suspended long enough
+ *       to miss an entire modification sequence, once it resumes it might
+ *       observe the new entry.
  */
 static inline void raw_write_seqcount_latch(seqcount_t *s)
 {
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -235,32 +235,7 @@ static inline s64 timekeeping_get_ns_raw
  * We want to use this from any context including NMI and tracing /
  * instrumenting the timekeeping code itself.
  *
- * So we handle this differently than the other timekeeping accessor
- * functions which retry when the sequence count has changed. The
- * update side does:
- *
- * smp_wmb();	<- Ensure that the last base[1] update is visible
- * tkf->seq++;
- * smp_wmb();	<- Ensure that the seqcount update is visible
- * update(tkf->base[0], tkr);
- * smp_wmb();	<- Ensure that the base[0] update is visible
- * tkf->seq++;
- * smp_wmb();	<- Ensure that the seqcount update is visible
- * update(tkf->base[1], tkr);
- *
- * The reader side does:
- *
- * do {
- *	seq = tkf->seq;
- *	smp_rmb();
- *	idx = seq & 0x01;
- *	now = now(tkf->base[idx]);
- *	smp_rmb();
- * } while (seq != tkf->seq)
- *
- * As long as we update base[0] readers are forced off to
- * base[1]. Once base[0] is updated readers are redirected to base[0]
- * and the base[1] update takes place.
+ * Employ the latch technique; see @raw_write_seqcount_latch.
  *
  * So if a NMI hits the update of base[0] then it will use base[1]
  * which is still consistent. In the worst case this can result is a



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 7/9] rbtree: Implement generic latch_tree
  2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
                   ` (5 preceding siblings ...)
  2015-02-28 21:24 ` [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch() Peter Zijlstra
@ 2015-02-28 21:24 ` Peter Zijlstra
  2015-03-01 21:17   ` Michel Lespinasse
  2015-03-02 19:53   ` Paul E. McKenney
  2015-02-28 21:24 ` [RFC][PATCH 8/9] module: Optimize __module_address() using a latched RB-tree Peter Zijlstra
  2015-02-28 21:24 ` [RFC][PATCH 9/9] module: Use __module_address() for module_address_lookup() Peter Zijlstra
  8 siblings, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-02-28 21:24 UTC (permalink / raw)
  To: mingo, rusty, mathieu.desnoyers, oleg, paulmck
  Cc: linux-kernel, andi, rostedt, tglx, peterz, Michel Lespinasse,
	Andrea Arcangeli, David Woodhouse, Rik van Riel

[-- Attachment #1: peterz-rbtree-latch.patch --]
[-- Type: text/plain, Size: 4347 bytes --]

Implement a latched RB-tree in order to get RCU style lookups.

Cc: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/rbtree_latch.h |  140 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 140 insertions(+)

--- /dev/null
+++ b/include/linux/rbtree_latch.h
@@ -0,0 +1,140 @@
+/*
+ * Latched RB-trees
+ *
+ * Copyright (C) 2015 Intel Corp., Peter Zijlstra <peterz@infradead.org>
+ */
+
+#ifndef RB_TREE_LATCH_H
+#define RB_TREE_LATCH_H
+
+#include <linux/rbtree.h>
+#include <linux/seqlock.h>
+
+/*
+ * Since RB-trees have non atomic modifications they're not suited for
+ * RCU/lockless queries.
+ *
+ * Employ the latch technique -- see @raw_write_seqcount_latch -- to implement
+ * a latched RB-tree which does allow this by virtue of always having (at
+ * least) one stable copy of the tree.
+ *
+ * However, while we have the guarantee that there is at all times one stable
+ * copy, this does not guarantee an iteration will not observe modifications.
+ * What might have been a stable copy at the start of the iteration, need not
+ * remain so for the duration of the iteration.
+ *
+ * Therefore, this does require a lockless RB-tree iteration to be non-fatal in
+ * all circumstances; see the comment in lib/rbtree.c.
+ */
+
+struct latch_tree_node {
+	void		*priv;
+	struct rb_node	node;
+};
+
+struct latch_tree_nodes {
+	struct latch_tree_node node[2];
+};
+
+struct latch_tree_root {
+	seqcount_t	seq;
+	struct rb_root	tree[2];
+};
+
+struct latch_tree_ops {
+	bool (*less)(struct latch_tree_node *a, struct latch_tree_node *b);
+	int  (*comp)(void *key,                 struct latch_tree_node *b);
+};
+
+static __always_inline void
+__lt_insert(struct latch_tree_node *ltn, struct rb_root *root,
+	    bool (*less)(struct latch_tree_node *a, struct latch_tree_node *b))
+{
+	struct rb_node **link = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct latch_tree_node *ltp;
+
+	while (*link) {
+		parent = *link;
+		ltp = container_of(parent, struct latch_tree_node, node);
+
+		if (less(ltn, ltp))
+			link = &parent->rb_left;
+		else
+			link = &parent->rb_right;
+	}
+
+	rb_link_node_rcu(&ltn->node, parent, link);
+	rb_insert_color(&ltn->node, root);
+}
+
+static __always_inline void
+__lt_erase(struct latch_tree_node *ltn, struct rb_root *root)
+{
+	rb_erase(&ltn->node, root);
+}
+
+static __always_inline struct latch_tree_node *
+__lt_find(void *key, struct rb_root *root,
+	  int (*comp)(void *key, struct latch_tree_node *ltn))
+{
+	struct rb_node *n = rcu_dereference_raw(root->rb_node);
+	struct latch_tree_node *ltn;
+	int c;
+
+	while (n) {
+		ltn = container_of(n, struct latch_tree_node, node);
+		c = comp(key, ltn);
+
+		if (c < 0)
+			n = rcu_dereference_raw(n->rb_left);
+		else if (c > 0)
+			n = rcu_dereference_raw(n->rb_right);
+		else
+			return ltn;
+	}
+
+	return NULL;
+}
+
+static __always_inline void
+latch_tree_insert(struct latch_tree_nodes *nodes,
+		  struct latch_tree_root *root,
+		  void *priv,
+		  const struct latch_tree_ops *ops)
+{
+	nodes->node[0].priv = nodes->node[1].priv = priv;
+
+	raw_write_seqcount_latch(&root->seq);
+	__lt_insert(&nodes->node[0], &root->tree[0], ops->less);
+	raw_write_seqcount_latch(&root->seq);
+	__lt_insert(&nodes->node[1], &root->tree[1], ops->less);
+}
+
+static __always_inline void
+latch_tree_erase(struct latch_tree_nodes *nodes,
+		 struct latch_tree_root *root,
+		 const struct latch_tree_ops *ops)
+{
+	raw_write_seqcount_latch(&root->seq);
+	__lt_erase(&nodes->node[0], &root->tree[0]);
+	raw_write_seqcount_latch(&root->seq);
+	__lt_erase(&nodes->node[1], &root->tree[1]);
+}
+
+static __always_inline struct latch_tree_node *
+latch_tree_find(void *key, struct latch_tree_root *root,
+		const struct latch_tree_ops *ops)
+{
+	struct latch_tree_node *node;
+	unsigned int seq;
+
+	do {
+		seq = raw_read_seqcount(&root->seq);
+		node = __lt_find(key, &root->tree[seq & 1], ops->comp);
+	} while (read_seqcount_retry(&root->seq, seq));
+
+	return node;
+}
+
+#endif /* RB_TREE_LATCH_H */



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 8/9] module: Optimize __module_address() using a latched RB-tree
  2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
                   ` (6 preceding siblings ...)
  2015-02-28 21:24 ` [RFC][PATCH 7/9] rbtree: Implement generic latch_tree Peter Zijlstra
@ 2015-02-28 21:24 ` Peter Zijlstra
  2015-02-28 21:24 ` [RFC][PATCH 9/9] module: Use __module_address() for module_address_lookup() Peter Zijlstra
  8 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-02-28 21:24 UTC (permalink / raw)
  To: mingo, rusty, mathieu.desnoyers, oleg, paulmck
  Cc: linux-kernel, andi, rostedt, tglx, peterz

[-- Attachment #1: peterz-module-latch-tree.patch --]
[-- Type: text/plain, Size: 6634 bytes --]

Currently __module_address() is using a linear search through all
modules in order to find the module corresponding to the provided
address. With a lot of modules this can take a lot of time.

One of the users of this is kernel_text_address() which is employed
in many stack unwinders; which in turn are used by perf-callchain and
ftrace (possibly from NMI context).

So by optimizing __module_address() we optimize many stack unwinders
which are used by both perf and tracing in performance sensitive code.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/module.h |   22 ++++++++--
 kernel/module.c        |  107 ++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 121 insertions(+), 8 deletions(-)

--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -17,6 +17,7 @@
 #include <linux/moduleparam.h>
 #include <linux/jump_label.h>
 #include <linux/export.h>
+#include <linux/rbtree_latch.h>
 
 #include <linux/percpu.h>
 #include <asm/module.h>
@@ -269,8 +270,15 @@ struct module {
 	/* Startup function. */
 	int (*init)(void);
 
-	/* If this is non-NULL, vfree after init() returns */
-	void *module_init;
+	/*
+	 * If this is non-NULL, vfree after init() returns.
+	 *
+	 * Cacheline align here, such that:
+	 *   module_init, module_core, init_size, core_size,
+	 *   init_text_size, core_text_size and ltn_core.node[0]
+	 * are on the same cacheline.
+	 */
+	void *module_init	____cacheline_aligned;
 
 	/* Here is the actual code + data, vfree'd on unload. */
 	void *module_core;
@@ -281,6 +289,14 @@ struct module {
 	/* The size of the executable code in each section.  */
 	unsigned int init_text_size, core_text_size;
 
+	/*
+	 * We rely on the order of these two entries; not only do we want
+	 * core.node[0] to be in the same cacheline as the above entries,
+	 * we also assume ltn_init has a higher address than ltn_core.
+	 */
+	struct latch_tree_nodes	ltn_core;
+	struct latch_tree_nodes	ltn_init;
+
 	/* Size of RO sections of the module (text+rodata) */
 	unsigned int init_ro_size, core_ro_size;
 
@@ -361,7 +377,7 @@ struct module {
 	ctor_fn_t *ctors;
 	unsigned int num_ctors;
 #endif
-};
+} ____cacheline_aligned;
 #ifndef MODULE_ARCH_INIT
 #define MODULE_ARCH_INIT {}
 #endif
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -102,6 +102,99 @@
 DEFINE_MUTEX(module_mutex);
 EXPORT_SYMBOL_GPL(module_mutex);
 static LIST_HEAD(modules);
+
+/*
+ * Use a latched RB-tree for __module_address(); this allows us to use
+ * RCU-sched lookups of the address from any context.
+ *
+ * Because modules have two address ranges: init and core, we need two
+ * latch_tree_nodes entries. We use the order they appear in struct module to
+ * determine if we need to use the init or core values for the comparisons.
+ */
+
+static __always_inline unsigned long __mod_tree_val(struct latch_tree_node *n)
+{
+	struct module *mod = n->priv;
+
+	if (n >= mod->ltn_init.node)
+		return (unsigned long)mod->module_init;
+	else
+		return (unsigned long)mod->module_core;
+}
+
+static __always_inline unsigned long __mod_tree_size(struct latch_tree_node *n)
+{
+	struct module *mod = n->priv;
+
+	if (n >= mod->ltn_init.node)
+		return (unsigned long)mod->init_size;
+	else
+		return (unsigned long)mod->core_size;
+}
+
+static __always_inline bool
+mod_tree_less(struct latch_tree_node *a, struct latch_tree_node *b)
+{
+	return __mod_tree_val(a) < __mod_tree_val(b);
+}
+
+static __always_inline int
+mod_tree_comp(void *key, struct latch_tree_node *n)
+{
+	unsigned long val = (unsigned long)key;
+	unsigned long start, end;
+
+	end = start = __mod_tree_val(n);
+	end += __mod_tree_size(n);
+
+	if (val < start)
+		return -1;
+
+	if (val >= end)
+		return 1;
+
+	return 0;
+}
+
+static const struct latch_tree_ops mod_tree_ops = {
+	.less = mod_tree_less,
+	.comp = mod_tree_comp,
+};
+
+static struct latch_tree_root mod_tree __cacheline_aligned;
+
+/*
+ * These modifications: insert, remove_init and remove; are serialized by the
+ * module_mutex.
+ */
+static void mod_tree_insert(struct module *mod)
+{
+	latch_tree_insert(&mod->ltn_core, &mod_tree, mod, &mod_tree_ops);
+	if (mod->init_size)
+		latch_tree_insert(&mod->ltn_init, &mod_tree, mod, &mod_tree_ops);
+}
+
+static void mod_tree_remove_init(struct module *mod)
+{
+	if (mod->init_size)
+		latch_tree_erase(&mod->ltn_init, &mod_tree, &mod_tree_ops);
+}
+
+static void mod_tree_remove(struct module *mod)
+{
+	latch_tree_erase(&mod->ltn_core, &mod_tree, &mod_tree_ops);
+	mod_tree_remove_init(mod);
+}
+
+static struct module *mod_tree_find(unsigned long addr)
+{
+	struct latch_tree_node *ltn;
+
+	ltn = latch_tree_find((void *)addr, &mod_tree, &mod_tree_ops);
+
+	return ltn ? ltn->priv : NULL;
+}
+
 #ifdef CONFIG_KGDB_KDB
 struct list_head *kdb_modules = &modules; /* kdb needs the list of modules */
 #endif /* CONFIG_KGDB_KDB */
@@ -1876,6 +1969,7 @@ static void free_module(struct module *m
 	mutex_lock(&module_mutex);
 	/* Unlink carefully: kallsyms could be walking list. */
 	list_del_rcu(&mod->list);
+	mod_tree_remove(mod);
 	/* Remove this module from bug list, this uses list_del_rcu */
 	module_bug_cleanup(mod);
 	/* Wait for RCU-sched synchronizing before releasing mod->list and buglist. */
@@ -3120,6 +3214,7 @@ static noinline int do_init_module(struc
 	mod->symtab = mod->core_symtab;
 	mod->strtab = mod->core_strtab;
 #endif
+	mod_tree_remove_init(mod);
 	unset_module_init_ro_nx(mod);
 	module_arch_freeing_init(mod);
 	mod->module_init = NULL;
@@ -3190,6 +3285,7 @@ static int add_unformed_module(struct mo
 		goto out;
 	}
 	list_add_rcu(&mod->list, &modules);
+	mod_tree_insert(mod);
 	err = 0;
 
 out:
@@ -3389,6 +3485,7 @@ static int load_module(struct load_info
 	mutex_lock(&module_mutex);
 	/* Unlink carefully: kallsyms could be walking list. */
 	list_del_rcu(&mod->list);
+	mod_tree_remove(mod);
 	wake_up_all(&module_wq);
 	/* Wait for RCU-sched synchronizing before releasing mod->list. */
 	synchronize_sched();
@@ -3833,13 +3930,13 @@ struct module *__module_address(unsigned
 
 	module_assert_mutex_or_preempt();
 
-	list_for_each_entry_rcu(mod, &modules, list) {
+	mod = mod_tree_find(addr);
+	if (mod) {
+		BUG_ON(!within_module(addr, mod));
 		if (mod->state == MODULE_STATE_UNFORMED)
-			continue;
-		if (within_module(addr, mod))
-			return mod;
+			mod = NULL;
 	}
-	return NULL;
+	return mod;
 }
 EXPORT_SYMBOL_GPL(__module_address);
 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 9/9] module: Use __module_address() for module_address_lookup()
  2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
                   ` (7 preceding siblings ...)
  2015-02-28 21:24 ` [RFC][PATCH 8/9] module: Optimize __module_address() using a latched RB-tree Peter Zijlstra
@ 2015-02-28 21:24 ` Peter Zijlstra
  8 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-02-28 21:24 UTC (permalink / raw)
  To: mingo, rusty, mathieu.desnoyers, oleg, paulmck
  Cc: linux-kernel, andi, rostedt, tglx, peterz

[-- Attachment #1: peterz-module-use-__module_address.patch --]
[-- Type: text/plain, Size: 1127 bytes --]

Use the generic __module_address() addr to struct module lookup
instead of open coding it once more.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/module.c |   17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3515,19 +3515,15 @@ const char *module_address_lookup(unsign
 			    char **modname,
 			    char *namebuf)
 {
-	struct module *mod;
 	const char *ret = NULL;
+	struct module *mod;
 
 	preempt_disable();
-	list_for_each_entry_rcu(mod, &modules, list) {
-		if (mod->state == MODULE_STATE_UNFORMED)
-			continue;
-		if (within_module(addr, mod)) {
-			if (modname)
-				*modname = mod->name;
-			ret = get_ksymbol(mod, addr, size, offset);
-			break;
-		}
+	mod = __module_address(addr);
+	if (mod) {
+		if (modname)
+			*modname = mod->name;
+		ret = get_ksymbol(mod, addr, size, offset);
 	}
 	/* Make a copy in here where it's safe */
 	if (ret) {
@@ -3535,6 +3531,7 @@ const char *module_address_lookup(unsign
 		ret = namebuf;
 	}
 	preempt_enable();
+
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
  2015-02-28 21:24 ` [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal Peter Zijlstra
@ 2015-03-01 13:52   ` Mathieu Desnoyers
  2015-03-02  8:27     ` Peter Zijlstra
  2015-03-01 21:11   ` Michel Lespinasse
  1 sibling, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2015-03-01 13:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, oleg, paulmck, linux-kernel, andi, rostedt, tglx,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel

----- Original Message -----
> From: "Peter Zijlstra" <peterz@infradead.org>
> To: mingo@kernel.org, rusty@rustcorp.com.au, "mathieu desnoyers" <mathieu.desnoyers@efficios.com>, oleg@redhat.com,
> paulmck@linux.vnet.ibm.com
> Cc: linux-kernel@vger.kernel.org, andi@firstfloor.org, rostedt@goodmis.org, tglx@linutronix.de, peterz@infradead.org,
> "Michel Lespinasse" <walken@google.com>, "Andrea Arcangeli" <aarcange@redhat.com>, "David Woodhouse"
> <David.Woodhouse@intel.com>, "Rik van Riel" <riel@redhat.com>
> Sent: Saturday, February 28, 2015 4:24:52 PM
> Subject: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
> 
> Change the insert and erase code such that lockless searches are
> non-fatal.
> 
> In and of itself an rbtree cannot be correctly searched while
> in-modification, we can however provide weaker guarantees that will
> allow the rbtree to be used in conjunction with other techniques, such
> as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").
> 
> For this to work we need the following guarantees from the rbtree
> code:
> 
>  1) a lockless reader must not see partial stores, this would allow it
>     to observe nodes that are invalid memory.
> 
>  2) there must not be (temporary) loops in the tree structure in the
>     modifier's program order, this would cause a lookup which
>     interrupts the modifier to get stuck indefinitely.

For (2), I don't think this is how the situation should be described.

Let's consider a scenario where an interrupt nests over the modification.
First, the modification will switch the latch to the other version of the
tree. Therefore, the interrupt will see a fully consistent tree, not the
tree being modified. Therefore, a temporary loop in the tree should not
be an issue for that peculiar situation.

However, if we have another thread traversing the tree while we
concurrently switch the latch and modify one version of the tree,
creating a temporary loop in the tree, this thread could possibly:

A) deadlock with the modification if there is a locking dependency
   between tree modification, tree read, and another lock (transitive
   dependency).
B) have the modifier starved by the other thread, if that thread has
   a higher scheduling priority (e.g. RT) than the modifier. The high
   priority thread would then use all its CPU time to perform the
   temporary loop.

So I agree that loops are unwanted there: it allows us to never have
to care about situations A and B. However, the explanation about why
should not involve, AFAIU, an interrupt handler nesting over the tree
modification, because this is precisely one scenario that should not
care about loops.

Thoughs ?

Thanks!

Mathieu


> 
> For 1) we must use WRITE_ONCE() for all updates to the tree structure;
> in particular this patch only does rb_{left,right} as those are the
> only element required for simple searches.
> 
> It generates slightly worse code, probably because gcc stinks at
> volatile. But in pointer chasing heavy code a few instructions more
> should not matter.
> 
> For 2) I have carefully audited the code and drawn every intermediate
> link state and not found a loop.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: David Woodhouse <David.Woodhouse@intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/rbtree.h           |   10 +++++
>  include/linux/rbtree_augmented.h |   21 +++++++----
>  lib/rbtree.c                     |   71
>  ++++++++++++++++++++++++++-------------
>  3 files changed, 73 insertions(+), 29 deletions(-)
> 
> --- a/include/linux/rbtree.h
> +++ b/include/linux/rbtree.h
> @@ -31,6 +31,7 @@
>  
>  #include <linux/kernel.h>
>  #include <linux/stddef.h>
> +#include <linux/rcupdate.h>
>  
>  struct rb_node {
>  	unsigned long  __rb_parent_color;
> @@ -85,6 +86,15 @@ static inline void rb_link_node(struct r
>  	*rb_link = node;
>  }
>  
> +static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node *
> parent,
> +				    struct rb_node ** rb_link)
> +{
> +	node->__rb_parent_color = (unsigned long)parent;
> +	node->rb_left = node->rb_right = NULL;
> +
> +	rcu_assign_pointer(*rb_link, node);
> +}
> +
>  #define rb_entry_safe(ptr, type, member) \
>  	({ typeof(ptr) ____ptr = (ptr); \
>  	   ____ptr ? rb_entry(____ptr, type, member) : NULL; \
> --- a/include/linux/rbtree_augmented.h
> +++ b/include/linux/rbtree_augmented.h
> @@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s
>  {
>  	if (parent) {
>  		if (parent->rb_left == old)
> -			parent->rb_left = new;
> +			WRITE_ONCE(parent->rb_left, new);
>  		else
> -			parent->rb_right = new;
> +			WRITE_ONCE(parent->rb_right, new);
>  	} else
> -		root->rb_node = new;
> +		WRITE_ONCE(root->rb_node, new);
>  }
>  
>  extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
> @@ -137,7 +137,8 @@ static __always_inline struct rb_node *
>  __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
>  		     const struct rb_augment_callbacks *augment)
>  {
> -	struct rb_node *child = node->rb_right, *tmp = node->rb_left;
> +	struct rb_node *child = node->rb_right;
> +	struct rb_node *tmp = node->rb_left;
>  	struct rb_node *parent, *rebalance;
>  	unsigned long pc;
>  
> @@ -167,6 +168,7 @@ __rb_erase_augmented(struct rb_node *nod
>  		tmp = parent;
>  	} else {
>  		struct rb_node *successor = child, *child2;
> +
>  		tmp = child->rb_left;
>  		if (!tmp) {
>  			/*
> @@ -180,6 +182,7 @@ __rb_erase_augmented(struct rb_node *nod
>  			 */
>  			parent = successor;
>  			child2 = successor->rb_right;
> +
>  			augment->copy(node, successor);
>  		} else {
>  			/*
> @@ -201,19 +204,23 @@ __rb_erase_augmented(struct rb_node *nod
>  				successor = tmp;
>  				tmp = tmp->rb_left;
>  			} while (tmp);
> -			parent->rb_left = child2 = successor->rb_right;
> -			successor->rb_right = child;
> +			child2 = successor->rb_right;
> +			WRITE_ONCE(parent->rb_left, child2);
> +			WRITE_ONCE(successor->rb_right, child);
>  			rb_set_parent(child, successor);
> +
>  			augment->copy(node, successor);
>  			augment->propagate(parent, successor);
>  		}
>  
> -		successor->rb_left = tmp = node->rb_left;
> +		tmp = node->rb_left;
> +		WRITE_ONCE(successor->rb_left, tmp);
>  		rb_set_parent(tmp, successor);
>  
>  		pc = node->__rb_parent_color;
>  		tmp = __rb_parent(pc);
>  		__rb_change_child(node, successor, tmp, root);
> +
>  		if (child2) {
>  			successor->__rb_parent_color = pc;
>  			rb_set_parent_color(child2, parent, RB_BLACK);
> --- a/lib/rbtree.c
> +++ b/lib/rbtree.c
> @@ -44,6 +44,25 @@
>   *  parentheses and have some accompanying text comment.
>   */
>  
> +/*
> + * All stores to the tree structure (rb_left and rb_right) must be done
> using
> + * WRITE_ONCE(). And we must not inadvertently cause (temporary) loops in
> the
> + * tree structure as seen in program order.
> + *
> + * These two requirements will allow lockless iteration of the tree -- not
> + * correct iteration mind you, tree rotations are not atomic so a lookup
> might
> + * miss entire subtrees.
> + *
> + * But they do guarantee that any such traversal will only see valid
> elements
> + * and that it will indeed complete -- does not get stuck in a loop.
> + *
> + * NOTE:
> + *
> + * Stores to __rb_parent_color are not important for simple lookups so those
> + * are left undone as of now. Nor did I check for loops involving parent
> + * pointers.
> + */
> +
>  static inline void rb_set_black(struct rb_node *rb)
>  {
>  	rb->__rb_parent_color |= RB_BLACK;
> @@ -129,8 +148,9 @@ __rb_insert(struct rb_node *node, struct
>  				 * This still leaves us in violation of 4), the
>  				 * continuation into Case 3 will fix that.
>  				 */
> -				parent->rb_right = tmp = node->rb_left;
> -				node->rb_left = parent;
> +				tmp = node->rb_left;
> +				WRITE_ONCE(parent->rb_right, tmp);
> +				WRITE_ONCE(node->rb_left, parent);
>  				if (tmp)
>  					rb_set_parent_color(tmp, parent,
>  							    RB_BLACK);
> @@ -149,8 +169,8 @@ __rb_insert(struct rb_node *node, struct
>  			 *     /                 \
>  			 *    n                   U
>  			 */
> -			gparent->rb_left = tmp;  /* == parent->rb_right */
> -			parent->rb_right = gparent;
> +			WRITE_ONCE(gparent->rb_left, tmp); /* == parent->rb_right */
> +			WRITE_ONCE(parent->rb_right, gparent);
>  			if (tmp)
>  				rb_set_parent_color(tmp, gparent, RB_BLACK);
>  			__rb_rotate_set_parents(gparent, parent, root, RB_RED);
> @@ -171,8 +191,9 @@ __rb_insert(struct rb_node *node, struct
>  			tmp = parent->rb_left;
>  			if (node == tmp) {
>  				/* Case 2 - right rotate at parent */
> -				parent->rb_left = tmp = node->rb_right;
> -				node->rb_right = parent;
> +				tmp = node->rb_right;
> +				WRITE_ONCE(parent->rb_left, tmp);
> +				WRITE_ONCE(node->rb_right, parent);
>  				if (tmp)
>  					rb_set_parent_color(tmp, parent,
>  							    RB_BLACK);
> @@ -183,8 +204,8 @@ __rb_insert(struct rb_node *node, struct
>  			}
>  
>  			/* Case 3 - left rotate at gparent */
> -			gparent->rb_right = tmp;  /* == parent->rb_left */
> -			parent->rb_left = gparent;
> +			WRITE_ONCE(gparent->rb_right, tmp); /* == parent->rb_left */
> +			WRITE_ONCE(parent->rb_left, gparent);
>  			if (tmp)
>  				rb_set_parent_color(tmp, gparent, RB_BLACK);
>  			__rb_rotate_set_parents(gparent, parent, root, RB_RED);
> @@ -224,8 +245,9 @@ ____rb_erase_color(struct rb_node *paren
>  				 *      / \         / \
>  				 *     Sl  Sr      N   Sl
>  				 */
> -				parent->rb_right = tmp1 = sibling->rb_left;
> -				sibling->rb_left = parent;
> +				tmp1 = sibling->rb_left;
> +				WRITE_ONCE(parent->rb_right, tmp1);
> +				WRITE_ONCE(sibling->rb_left, parent);
>  				rb_set_parent_color(tmp1, parent, RB_BLACK);
>  				__rb_rotate_set_parents(parent, sibling, root,
>  							RB_RED);
> @@ -275,9 +297,10 @@ ____rb_erase_color(struct rb_node *paren
>  				 *                       \
>  				 *                        Sr
>  				 */
> -				sibling->rb_left = tmp1 = tmp2->rb_right;
> -				tmp2->rb_right = sibling;
> -				parent->rb_right = tmp2;
> +				tmp1 = tmp2->rb_right;
> +				WRITE_ONCE(sibling->rb_left, tmp1);
> +				WRITE_ONCE(tmp2->rb_right, sibling);
> +				WRITE_ONCE(parent->rb_right, tmp2);
>  				if (tmp1)
>  					rb_set_parent_color(tmp1, sibling,
>  							    RB_BLACK);
> @@ -297,8 +320,9 @@ ____rb_erase_color(struct rb_node *paren
>  			 *        / \         / \
>  			 *      (sl) sr      N  (sl)
>  			 */
> -			parent->rb_right = tmp2 = sibling->rb_left;
> -			sibling->rb_left = parent;
> +			tmp2 = sibling->rb_left;
> +			WRITE_ONCE(parent->rb_right, tmp2);
> +			WRITE_ONCE(sibling->rb_left, parent);
>  			rb_set_parent_color(tmp1, sibling, RB_BLACK);
>  			if (tmp2)
>  				rb_set_parent(tmp2, parent);
> @@ -310,8 +334,9 @@ ____rb_erase_color(struct rb_node *paren
>  			sibling = parent->rb_left;
>  			if (rb_is_red(sibling)) {
>  				/* Case 1 - right rotate at parent */
> -				parent->rb_left = tmp1 = sibling->rb_right;
> -				sibling->rb_right = parent;
> +				tmp1 = sibling->rb_right;
> +				WRITE_ONCE(parent->rb_left, tmp1);
> +				WRITE_ONCE(sibling->rb_right, parent);
>  				rb_set_parent_color(tmp1, parent, RB_BLACK);
>  				__rb_rotate_set_parents(parent, sibling, root,
>  							RB_RED);
> @@ -336,9 +361,10 @@ ____rb_erase_color(struct rb_node *paren
>  					break;
>  				}
>  				/* Case 3 - right rotate at sibling */
> -				sibling->rb_right = tmp1 = tmp2->rb_left;
> -				tmp2->rb_left = sibling;
> -				parent->rb_left = tmp2;
> +				tmp1 = tmp2->rb_left;
> +				WRITE_ONCE(sibling->rb_right, tmp1);
> +				WRITE_ONCE(tmp2->rb_left, sibling);
> +				WRITE_ONCE(parent->rb_left, tmp2);
>  				if (tmp1)
>  					rb_set_parent_color(tmp1, sibling,
>  							    RB_BLACK);
> @@ -347,8 +373,9 @@ ____rb_erase_color(struct rb_node *paren
>  				sibling = tmp2;
>  			}
>  			/* Case 4 - left rotate at parent + color flips */
> -			parent->rb_left = tmp2 = sibling->rb_right;
> -			sibling->rb_right = parent;
> +			tmp2 = sibling->rb_right;
> +			WRITE_ONCE(parent->rb_left, tmp2);
> +			WRITE_ONCE(sibling->rb_right, parent);
>  			rb_set_parent_color(tmp1, sibling, RB_BLACK);
>  			if (tmp2)
>  				rb_set_parent(tmp2, parent);
> 
> 
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch()
  2015-02-28 21:24 ` [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch() Peter Zijlstra
@ 2015-03-01 14:02   ` Mathieu Desnoyers
  2015-03-02  8:33     ` Peter Zijlstra
  2015-03-01 21:12   ` Michel Lespinasse
  1 sibling, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2015-03-01 14:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, oleg, paulmck, linux-kernel, andi, rostedt, tglx,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel

----- Original Message -----
> From: "Peter Zijlstra" <peterz@infradead.org>
> To: mingo@kernel.org, rusty@rustcorp.com.au, "mathieu desnoyers" <mathieu.desnoyers@efficios.com>, oleg@redhat.com,
> paulmck@linux.vnet.ibm.com
> Cc: linux-kernel@vger.kernel.org, andi@firstfloor.org, rostedt@goodmis.org, tglx@linutronix.de, peterz@infradead.org,
> "Michel Lespinasse" <walken@google.com>, "Andrea Arcangeli" <aarcange@redhat.com>, "David Woodhouse"
> <David.Woodhouse@intel.com>, "Rik van Riel" <riel@redhat.com>
> Sent: Saturday, February 28, 2015 4:24:53 PM
> Subject: [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch()
> 
> Improve the documentation of the latch technique as used in the
> current timekeeping code, such that it can be readily employed
> elsewhere.
> 
> Borrow from the comments in timekeeping and replace those with a
> reference to this more generic comment.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: David Woodhouse <David.Woodhouse@intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/seqlock.h   |   74
>  +++++++++++++++++++++++++++++++++++++++++++++-
>  kernel/time/timekeeping.c |   27 ----------------
>  2 files changed, 74 insertions(+), 27 deletions(-)
> 
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -233,9 +233,81 @@ static inline void raw_write_seqcount_en
>  	s->sequence++;
>  }
>  
> -/*
> +/**
>   * raw_write_seqcount_latch - redirect readers to even/odd copy
>   * @s: pointer to seqcount_t
> + *
> + * The latch technique is a multiversion concurrency control method that
> allows
> + * queries during non atomic modifications. If you can guarantee queries
> never
> + * interrupt the modification -- e.g. the concurrency is strictly between
> CPUs
> + * -- you most likely do not need this.
> + *
> + * Where the traditional RCU/lockless data structures rely on atomic
> + * modifications to ensure queries observe either the old or the new state
> the
> + * latch allows the same for non atomic updates. The trade-off is doubling
> the
> + * cost of storage; we have to maintain two copies of the entire data
> + * structure.
> + *
> + * Very simply put: we first modify one copy and then the other. This
> ensures
> + * there is always one copy in a stable state, ready to give us an answer.
> + *
> + * The basic form is a data structure like:
> + *
> + * struct latch_struct {
> + *	seqcount_t		seq;
> + *	struct data_struct	data[2];
> + * };
> + *
> + * Where a modification, which is assumed to be externally serialized, does
> the
> + * following:
> + *
> + * void latch_modify(struct latch_struct *latch, ...)
> + * {
> + *	smp_wmb();	<- Ensure that the last data[1] update is visible
> + *	latch->seq++;
> + *	smp_wmb();	<- Ensure that the seqcount update is visible
> + *
> + *	modify(latch->data[0], ...);
> + *
> + *	smp_wmb();	<- Ensure that the data[0] update is visible
> + *	latch->seq++;
> + *	smp_wmb();	<- Ensure that the seqcount update is visible
> + *
> + *	modify(latch->data[1], ...);
> + * }
> + *
> + * The query will have a form like:
> + *
> + * struct entry *latch_query(struct latch_struct *latch, ...)
> + * {
> + *	struct entry *entry;
> + *	unsigned seq;
> + *	int idx;
> + *
> + *	do {
> + *		seq = latch->seq;
> + *		smp_rmb();
> + *
> + *		idx = seq & 0x01;
> + *		entry = data_query(latch->data[idx], ...);
> + *
> + *		smp_rmb();
> + *	} while (seq != latch->seq);
> + *
> + *	return entry;
> + * }
> + *
> + * So during the modification, queries are first redirected to data[1]. Then
> we
> + * modify data[0]. When that is complete, we redirect queries back to
> data[0]
> + * and we can modify data[1].
> + *
> + * NOTE: The non-requirement for atomic modifications does _NOT_ include
> + *       the publishing of new entries in the case where data is a dynamic
> + *       data structure.
> + *
> + *       An iteration might start in data[0] and get suspended long enough
> + *       to miss an entire modification sequence, once it resumes it might
> + *       observe the new entry.

We might want to hint that in the case of dynamic data structures,
RCU read-side C.S. and grace period should be used together with the
latch to handle the object teardown.

The latch, AFAIU, takes care of making sure the new objects are
initialized before being published into the data structure, so there
would be no need to use RCU assign pointer. However, we really need
RCU around reads, along with a grace period between removal of an object
and its teardown.

Thanks,

Mathieu

>   */
>  static inline void raw_write_seqcount_latch(seqcount_t *s)
>  {
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -235,32 +235,7 @@ static inline s64 timekeeping_get_ns_raw
>   * We want to use this from any context including NMI and tracing /
>   * instrumenting the timekeeping code itself.
>   *
> - * So we handle this differently than the other timekeeping accessor
> - * functions which retry when the sequence count has changed. The
> - * update side does:
> - *
> - * smp_wmb();	<- Ensure that the last base[1] update is visible
> - * tkf->seq++;
> - * smp_wmb();	<- Ensure that the seqcount update is visible
> - * update(tkf->base[0], tkr);
> - * smp_wmb();	<- Ensure that the base[0] update is visible
> - * tkf->seq++;
> - * smp_wmb();	<- Ensure that the seqcount update is visible
> - * update(tkf->base[1], tkr);
> - *
> - * The reader side does:
> - *
> - * do {
> - *	seq = tkf->seq;
> - *	smp_rmb();
> - *	idx = seq & 0x01;
> - *	now = now(tkf->base[idx]);
> - *	smp_rmb();
> - * } while (seq != tkf->seq)
> - *
> - * As long as we update base[0] readers are forced off to
> - * base[1]. Once base[0] is updated readers are redirected to base[0]
> - * and the base[1] update takes place.
> + * Employ the latch technique; see @raw_write_seqcount_latch.
>   *
>   * So if a NMI hits the update of base[0] then it will use base[1]
>   * which is still consistent. In the worst case this can result is a
> 
> 
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 1/9] klp: Fix obvious RCU fail
  2015-02-28 21:24 ` [RFC][PATCH 1/9] klp: Fix obvious RCU fail Peter Zijlstra
@ 2015-03-01 20:09   ` Jiri Kosina
  2015-03-02  8:35     ` Peter Zijlstra
  2015-03-02  1:31   ` Masami Hiramatsu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: Jiri Kosina @ 2015-03-01 20:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Seth Jennings, Josh Poimboeuf,
	Masami Hiramatsu, Miroslav Benes, Petr Mladek

On Sat, 28 Feb 2015, Peter Zijlstra wrote:

> While one must hold RCU-sched (aka. preempt_disable) for find_symbol()
> one must equally hold it over the use of the object returned.
> 
> The moment you release the RCU-sched read lock, the object can be dead
> and gone.
> 
> Cc: Seth Jennings <sjenning@redhat.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Miroslav Benes <mbenes@suse.cz>
> Cc: Petr Mladek <pmladek@suse.cz>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Jiri Kosina <jkosina@suse.cz>

I guess you'll be taking this together with the series, so I am not 
applying it.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
  2015-02-28 21:24 ` [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal Peter Zijlstra
  2015-03-01 13:52   ` Mathieu Desnoyers
@ 2015-03-01 21:11   ` Michel Lespinasse
  2015-03-02  7:46     ` Ingo Molnar
  2015-03-02  8:23     ` Peter Zijlstra
  1 sibling, 2 replies; 39+ messages in thread
From: Michel Lespinasse @ 2015-03-01 21:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Andrea Arcangeli, David Woodhouse,
	Rik van Riel, Josh Triplett

On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> Change the insert and erase code such that lockless searches are
> non-fatal.
>
> In and of itself an rbtree cannot be correctly searched while
> in-modification, we can however provide weaker guarantees that will
> allow the rbtree to be used in conjunction with other techniques, such
> as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").
>
> For this to work we need the following guarantees from the rbtree
> code:
>
>  1) a lockless reader must not see partial stores, this would allow it
>     to observe nodes that are invalid memory.
>
>  2) there must not be (temporary) loops in the tree structure in the
>     modifier's program order, this would cause a lookup which
>     interrupts the modifier to get stuck indefinitely.
>
> For 1) we must use WRITE_ONCE() for all updates to the tree structure;
> in particular this patch only does rb_{left,right} as those are the
> only element required for simple searches.
>
> It generates slightly worse code, probably because gcc stinks at
> volatile. But in pointer chasing heavy code a few instructions more
> should not matter.

So, I was worried that this would penalize all rbtree users, for the
benefit of just the one you're adding later in this series. We have
several rbtrees where we care about performance a lot, such as the
ones used in the scheduler or for indexing vmas.

That said, I checked with the compiler we are using here (gcc 4.7
variant) and I didn't see any change in the generated code. So, no
issue here for me.

If the object code really is different in your setup, please use the
lib/rbtree_test module to check the performance impact of the change.

> For 2) I have carefully audited the code and drawn every intermediate
> link state and not found a loop.

As Mathieu remarked, we are never modifying the currently active tree,
so the interrupt case is not the reason for avoiding loops.


I think your proposal will work well for the use case you have in mind
(looking up modules based on address). However, I was wondering how
you would compare your proposal against an alternative I hard Josh
Triplett formulate before, where there would be one unique rbtree but
rotations would allocate new nodes rather than modify the existing
ones. I think this would be workable as well; I'm just not sure
whether this would be more or less generally applicable than your
proposal. Copying Josh in case he wants to chime in.

Reviewed-by: Michel Lespinasse <walken@google.com>

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch()
  2015-02-28 21:24 ` [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch() Peter Zijlstra
  2015-03-01 14:02   ` Mathieu Desnoyers
@ 2015-03-01 21:12   ` Michel Lespinasse
  1 sibling, 0 replies; 39+ messages in thread
From: Michel Lespinasse @ 2015-03-01 21:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Andrea Arcangeli, David Woodhouse,
	Rik van Riel

On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> Improve the documentation of the latch technique as used in the
> current timekeeping code, such that it can be readily employed
> elsewhere.
>
> Borrow from the comments in timekeeping and replace those with a
> reference to this more generic comment.

Acked-by: Michel Lespinasse <walken@google.com>

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 7/9] rbtree: Implement generic latch_tree
  2015-02-28 21:24 ` [RFC][PATCH 7/9] rbtree: Implement generic latch_tree Peter Zijlstra
@ 2015-03-01 21:17   ` Michel Lespinasse
  2015-03-02  8:05     ` Peter Zijlstra
  2015-03-02 19:53   ` Paul E. McKenney
  1 sibling, 1 reply; 39+ messages in thread
From: Michel Lespinasse @ 2015-03-01 21:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Andrea Arcangeli, David Woodhouse,
	Rik van Riel

On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> Implement a latched RB-tree in order to get RCU style lookups.

Looks fine to me.

I wanted to ask, you are writing this as a generic layer even though
there is a single use at the moment; do you anticipate this will get
more usage soon ? I may have more comments if I knew what those may be
(it's probably all fine if the rate of changes is very low such as
with module load/unload, but I'm not sure what else you have in mind).

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 1/9] klp: Fix obvious RCU fail
  2015-02-28 21:24 ` [RFC][PATCH 1/9] klp: Fix obvious RCU fail Peter Zijlstra
  2015-03-01 20:09   ` Jiri Kosina
@ 2015-03-02  1:31   ` Masami Hiramatsu
  2015-03-02 19:21   ` Paul E. McKenney
  2015-03-02 21:07   ` Josh Poimboeuf
  3 siblings, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2015-03-02  1:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Seth Jennings, Josh Poimboeuf,
	Miroslav Benes, Petr Mladek, Jiri Kosina

(2015/03/01 6:24), Peter Zijlstra wrote:
> While one must hold RCU-sched (aka. preempt_disable) for find_symbol()
> one must equally hold it over the use of the object returned.
> 
> The moment you release the RCU-sched read lock, the object can be dead
> and gone.
> 
> Cc: Seth Jennings <sjenning@redhat.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Miroslav Benes <mbenes@suse.cz>
> Cc: Petr Mladek <pmladek@suse.cz>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Looks good to me.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Thank you,

> ---
>  kernel/livepatch/core.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -248,11 +248,12 @@ static int klp_find_external_symbol(stru
>  	/* first, check if it's an exported symbol */
>  	preempt_disable();
>  	sym = find_symbol(name, NULL, NULL, true, true);
> -	preempt_enable();
>  	if (sym) {
>  		*addr = sym->value;
> +		preempt_enable();
>  		return 0;
>  	}
> +	preempt_enable();
>  
>  	/* otherwise check if it's in another .o within the patch module */
>  	return klp_find_object_symbol(pmod->name, name, addr);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
  2015-03-01 21:11   ` Michel Lespinasse
@ 2015-03-02  7:46     ` Ingo Molnar
  2015-03-02  8:23     ` Peter Zijlstra
  1 sibling, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2015-03-02  7:46 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Peter Zijlstra, rusty, mathieu.desnoyers, oleg, paulmck,
	linux-kernel, andi, rostedt, tglx, Andrea Arcangeli,
	David Woodhouse, Rik van Riel, Josh Triplett


* Michel Lespinasse <walken@google.com> wrote:

> On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > Change the insert and erase code such that lockless searches are
> > non-fatal.
> >
> > In and of itself an rbtree cannot be correctly searched while
> > in-modification, we can however provide weaker guarantees that will
> > allow the rbtree to be used in conjunction with other techniques, such
> > as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").
> >
> > For this to work we need the following guarantees from the rbtree
> > code:
> >
> >  1) a lockless reader must not see partial stores, this would allow it
> >     to observe nodes that are invalid memory.
> >
> >  2) there must not be (temporary) loops in the tree structure in the
> >     modifier's program order, this would cause a lookup which
> >     interrupts the modifier to get stuck indefinitely.
> >
> > For 1) we must use WRITE_ONCE() for all updates to the tree structure;
> > in particular this patch only does rb_{left,right} as those are the
> > only element required for simple searches.
> >
> > It generates slightly worse code, probably because gcc stinks at
> > volatile. But in pointer chasing heavy code a few instructions more
> > should not matter.
> 
> So, I was worried that this would penalize all rbtree users, for the 
> benefit of just the one you're adding later in this series. We have 
> several rbtrees where we care about performance a lot, such as the 
> ones used in the scheduler or for indexing vmas.
> 
> That said, I checked with the compiler we are using here (gcc 4.7 
> variant) and I didn't see any change in the generated code. So, no 
> issue here for me.

Yeah, I had that worry too. With gcc 4.9.1 on x86-64 defconfig I get 
this comparison:

lib/rbtree.o:

   text    data     bss     dec     hex filename
   3299       0       0    3299     ce3 rbtree.o.before
   3308       0       0    3308     cec rbtree.o.after

+9 bytes.

Most of the difference seems minimal, involves an extra register move 
saving addresses in registers and using them there:

  449:  4c 8b 60 10             mov    0x10(%rax),%r12
- 44d:  49 39 fc                cmp    %rdi,%r12
- 450:  0f 84 aa 00 00 00       je     500 <__rb_insert_augmented+0x1a0>
- 456:  49 89 c6                mov    %rax,%r14
- 459:  4d 85 e4                test   %r12,%r12
- 45c:  4c 89 63 08             mov    %r12,0x8(%rbx)
- 460:  48 89 58 10             mov    %rbx,0x10(%rax)
- 464:  74 0b                   je     471 <__rb_insert_augmented+0x111>


  449:  4c 8b 60 10             mov    0x10(%rax),%r12
+ 44d:  48 89 c6                mov    %rax,%rsi
+ 450:  49 39 fc                cmp    %rdi,%r12
+ 453:  0f 84 97 00 00 00       je     4f0 <__rb_insert_augmented+0x190>
+ 459:  49 89 c6                mov    %rax,%r14
+ 45c:  4d 85 e4                test   %r12,%r12
+ 45f:  4c 89 63 08             mov    %r12,0x8(%rbx)
+ 463:  48 89 5e 10             mov    %rbx,0x10(%rsi)
+ 467:  74 0b                   je     474 <__rb_insert_augmented+0x114>

So here instead of using 0x10(%rax) again, GCC moved %rax into %rsi 
and used 0x10(%rsi). That seems to be plain compiler stupidity (that 
move to %rsi is senseless with or without volatile), and gcc might 
improve in the future.

In my (admittedly quick) comparison I saw no serious changes like 
extra memory loads or stack spills.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 7/9] rbtree: Implement generic latch_tree
  2015-03-01 21:17   ` Michel Lespinasse
@ 2015-03-02  8:05     ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-02  8:05 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Andrea Arcangeli, David Woodhouse,
	Rik van Riel

On Sun, Mar 01, 2015 at 01:17:15PM -0800, Michel Lespinasse wrote:
> On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > Implement a latched RB-tree in order to get RCU style lookups.
> 
> Looks fine to me.
> 
> I wanted to ask, you are writing this as a generic layer even though
> there is a single use at the moment; do you anticipate this will get
> more usage soon ? I may have more comments if I knew what those may be
> (it's probably all fine if the rate of changes is very low such as
> with module load/unload, but I'm not sure what else you have in mind).

It was more an exercise in separation in order to better document the
various parts of this solution.

That said, we could maybe also use this for uprobes. Uprobes have a
global RB tree and every lookup is currently serialized by a spinlock.
I've not yet heard that its really painful, but I imagine a heavy uprobe
workload on big machines will eventually hurt.

And through the years there have been 'demands' for RCU RB-trees, so
maybe there's more potential users out there.

Note that while the update is more expensive than with a single tree,
its still of the same complexity, so its not horrendously more
expensive.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
  2015-03-01 21:11   ` Michel Lespinasse
  2015-03-02  7:46     ` Ingo Molnar
@ 2015-03-02  8:23     ` Peter Zijlstra
  2015-03-02  9:53       ` Peter Zijlstra
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-02  8:23 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Andrea Arcangeli, David Woodhouse,
	Rik van Riel, Josh Triplett

On Sun, Mar 01, 2015 at 01:11:23PM -0800, Michel Lespinasse wrote:
> On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:

> > It generates slightly worse code, probably because gcc stinks at
> > volatile. But in pointer chasing heavy code a few instructions more
> > should not matter.
> 
> So, I was worried that this would penalize all rbtree users, for the
> benefit of just the one you're adding later in this series. We have
> several rbtrees where we care about performance a lot, such as the
> ones used in the scheduler or for indexing vmas.
> 
> That said, I checked with the compiler we are using here (gcc 4.7
> variant) and I didn't see any change in the generated code. So, no
> issue here for me.
> 
> If the object code really is different in your setup, please use the
> lib/rbtree_test module to check the performance impact of the change.

I can do that; I had similar results to what Ingo posted. I meant to go
build a 4.9 or 5.0 compiler to see what current GCC makes of it, but
I've not yet gotten around to doing so.

My result were with 4.8.3 iirc.

> > For 2) I have carefully audited the code and drawn every intermediate
> > link state and not found a loop.
> 
> As Mathieu remarked, we are never modifying the currently active tree,
> so the interrupt case is not the reason for avoiding loops.

Correct, for the proposed use we do no. I did however double (actually
triple) check this property because I feel its a good property to have,
no matter what you do to the tree, a (simple) lookup will be non-fatal.

But yes, I'll clarify things.

> I think your proposal will work well for the use case you have in mind
> (looking up modules based on address). However, I was wondering how
> you would compare your proposal against an alternative I hard Josh
> Triplett formulate before, where there would be one unique rbtree but
> rotations would allocate new nodes rather than modify the existing
> ones. I think this would be workable as well; I'm just not sure
> whether this would be more or less generally applicable than your
> proposal. Copying Josh in case he wants to chime in.

So I was not aware of that particular solution.

It changes the rb-tree from using internal storage like we do now, to
requiring external storage.

I do have experience with making an RCU safe (in memory) B+tree, and
there the allocations were absolutely killing performance.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
  2015-03-01 13:52   ` Mathieu Desnoyers
@ 2015-03-02  8:27     ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-02  8:27 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: mingo, rusty, oleg, paulmck, linux-kernel, andi, rostedt, tglx,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel

On Sun, Mar 01, 2015 at 01:52:09PM +0000, Mathieu Desnoyers wrote:
> >  2) there must not be (temporary) loops in the tree structure in the
> >     modifier's program order, this would cause a lookup which
> >     interrupts the modifier to get stuck indefinitely.
> 
> For (2), I don't think this is how the situation should be described.
> 
> Let's consider a scenario where an interrupt nests over the modification.
> First, the modification will switch the latch to the other version of the
> tree. Therefore, the interrupt will see a fully consistent tree, not the
> tree being modified. Therefore, a temporary loop in the tree should not
> be an issue for that peculiar situation.
> 
> However, if we have another thread traversing the tree while we
> concurrently switch the latch and modify one version of the tree,
> creating a temporary loop in the tree, this thread could possibly:
> 
> A) deadlock with the modification if there is a locking dependency
>    between tree modification, tree read, and another lock (transitive
>    dependency).
> B) have the modifier starved by the other thread, if that thread has
>    a higher scheduling priority (e.g. RT) than the modifier. The high
>    priority thread would then use all its CPU time to perform the
>    temporary loop.
> 
> So I agree that loops are unwanted there: it allows us to never have
> to care about situations A and B. However, the explanation about why
> should not involve, AFAIU, an interrupt handler nesting over the tree
> modification, because this is precisely one scenario that should not
> care about loops.
> 
> Thoughs ?

This is true for the (later) proposed latched RB-tree, the description
is however true in general. If you somehow did a lookup while doing the
modification and you have loops in program order, you're stuck.

So in the interest of robustness I think we want this property
nonetheless. And its 'free', I didn't have to change any code for this.

I shall however clarify this point in the latched RB-tree patch.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch()
  2015-03-01 14:02   ` Mathieu Desnoyers
@ 2015-03-02  8:33     ` Peter Zijlstra
  2015-03-02  8:51       ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-02  8:33 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: mingo, rusty, oleg, paulmck, linux-kernel, andi, rostedt, tglx,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel

On Sun, Mar 01, 2015 at 02:02:23PM +0000, Mathieu Desnoyers wrote:
> > + * So during the modification, queries are first redirected to data[1]. Then we
> > + * modify data[0]. When that is complete, we redirect queries back to data[0]
> > + * and we can modify data[1].
> > + *
> > + * NOTE: The non-requirement for atomic modifications does _NOT_ include
> > + *       the publishing of new entries in the case where data is a dynamic
> > + *       data structure.
> > + *
> > + *       An iteration might start in data[0] and get suspended long enough
> > + *       to miss an entire modification sequence, once it resumes it might
> > + *       observe the new entry.
> 
> We might want to hint that in the case of dynamic data structures,
> RCU read-side C.S. and grace period should be used together with the
> latch to handle the object teardown.

Can do.

> The latch, AFAIU, takes care of making sure the new objects are
> initialized before being published into the data structure, so there
> would be no need to use RCU assign pointer. However, we really need
> RCU around reads, along with a grace period between removal of an object
> and its teardown.

So I do need the rcu_assign_pointer for the RB link because that also
initializes the rb_node itself. Or put differently, be _very_ _VERY_
sure your entire object is initialized before the latch.

Secondly, note that the latch does a WMB and rcu_assign_pointer does a
RELEASE, these are not equivalent.

So I don't think I will highlight this particular point. If you're sure
enough to know the difference you can get away with it, sure. But in
general I think people should still use rcu_assign_pointer; if only to
make Paul sleep better at night ;-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 1/9] klp: Fix obvious RCU fail
  2015-03-01 20:09   ` Jiri Kosina
@ 2015-03-02  8:35     ` Peter Zijlstra
  2015-03-02  9:13       ` Jiri Kosina
  2015-03-02  9:21       ` Petr Mladek
  0 siblings, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-02  8:35 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Seth Jennings, Josh Poimboeuf,
	Masami Hiramatsu, Miroslav Benes, Petr Mladek

On Sun, Mar 01, 2015 at 09:09:24PM +0100, Jiri Kosina wrote:
> On Sat, 28 Feb 2015, Peter Zijlstra wrote:
> 
> > While one must hold RCU-sched (aka. preempt_disable) for find_symbol()
> > one must equally hold it over the use of the object returned.
> > 
> > The moment you release the RCU-sched read lock, the object can be dead
> > and gone.
> > 
> > Cc: Seth Jennings <sjenning@redhat.com>
> > Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> > Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> > Cc: Miroslav Benes <mbenes@suse.cz>
> > Cc: Petr Mladek <pmladek@suse.cz>
> > Cc: Jiri Kosina <jkosina@suse.cz>
> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Acked-by: Jiri Kosina <jkosina@suse.cz>
> 
> I guess you'll be taking this together with the series, so I am not 
> applying it.

Feel free to take it; this series might take a wee while longer to
mature.

That said; I do have a follow up question on that code. So now you've
successfully obtained an address in module space; but the moment you
release that RCU-sched lock, the module can be gone.

How does the whole live patching stuff deal with module removal during
patching?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch()
  2015-03-02  8:33     ` Peter Zijlstra
@ 2015-03-02  8:51       ` Peter Zijlstra
  2015-03-02 19:46         ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-02  8:51 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: mingo, rusty, oleg, paulmck, linux-kernel, andi, rostedt, tglx,
	Michel Lespinasse, Andrea Arcangeli, David Woodhouse,
	Rik van Riel

On Mon, Mar 02, 2015 at 09:33:20AM +0100, Peter Zijlstra wrote:
> On Sun, Mar 01, 2015 at 02:02:23PM +0000, Mathieu Desnoyers wrote:

> > The latch, AFAIU, takes care of making sure the new objects are
> > initialized before being published into the data structure, so there
> > would be no need to use RCU assign pointer. However, we really need
> > RCU around reads, along with a grace period between removal of an object
> > and its teardown.
> 
> So I do need the rcu_assign_pointer for the RB link because that also
> initializes the rb_node itself. Or put differently, be _very_ _VERY_
> sure your entire object is initialized before the latch.
> 
> Secondly, note that the latch does a WMB and rcu_assign_pointer does a
> RELEASE, these are not equivalent.
> 
> So I don't think I will highlight this particular point. If you're sure
> enough to know the difference you can get away with it, sure. But in
> general I think people should still use rcu_assign_pointer; if only to
> make Paul sleep better at night ;-)

Also note that if you do not use rcu_assign_pointer() one will at the
very least require WRITE_ONCE().

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 1/9] klp: Fix obvious RCU fail
  2015-03-02  8:35     ` Peter Zijlstra
@ 2015-03-02  9:13       ` Jiri Kosina
  2015-03-02 10:00         ` Peter Zijlstra
  2015-03-02  9:21       ` Petr Mladek
  1 sibling, 1 reply; 39+ messages in thread
From: Jiri Kosina @ 2015-03-02  9:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Seth Jennings, Josh Poimboeuf,
	Masami Hiramatsu, Miroslav Benes, Petr Mladek

On Mon, 2 Mar 2015, Peter Zijlstra wrote:

> > Acked-by: Jiri Kosina <jkosina@suse.cz>
> > 
> > I guess you'll be taking this together with the series, so I am not 
> > applying it.
> 
> Feel free to take it; this series might take a wee while longer to
> mature.

OK, thanks.

> That said; I do have a follow up question on that code. So now you've
> successfully obtained an address in module space; but the moment you
> release that RCU-sched lock, the module can be gone.
> 
> How does the whole live patching stuff deal with module removal during
> patching?

We have a module notifier that serializes using klp_mutex with the 
patching. Do you see a problem there?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 1/9] klp: Fix obvious RCU fail
  2015-03-02  8:35     ` Peter Zijlstra
  2015-03-02  9:13       ` Jiri Kosina
@ 2015-03-02  9:21       ` Petr Mladek
  1 sibling, 0 replies; 39+ messages in thread
From: Petr Mladek @ 2015-03-02  9:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Kosina, mingo, rusty, mathieu.desnoyers, oleg, paulmck,
	linux-kernel, andi, rostedt, tglx, Seth Jennings, Josh Poimboeuf,
	Masami Hiramatsu, Miroslav Benes

On Mon 2015-03-02 09:35:30, Peter Zijlstra wrote:
> On Sun, Mar 01, 2015 at 09:09:24PM +0100, Jiri Kosina wrote:
> > On Sat, 28 Feb 2015, Peter Zijlstra wrote:
> > 
> > > While one must hold RCU-sched (aka. preempt_disable) for find_symbol()
> > > one must equally hold it over the use of the object returned.
> > > 
> > > The moment you release the RCU-sched read lock, the object can be dead
> > > and gone.
> > > 
> > > Cc: Seth Jennings <sjenning@redhat.com>
> > > Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> > > Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> > > Cc: Miroslav Benes <mbenes@suse.cz>
> > > Cc: Petr Mladek <pmladek@suse.cz>
> > > Cc: Jiri Kosina <jkosina@suse.cz>
> > > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > 
> > Acked-by: Jiri Kosina <jkosina@suse.cz>
> > 
> > I guess you'll be taking this together with the series, so I am not 
> > applying it.
> 
> Feel free to take it; this series might take a wee while longer to
> mature.
> 
> That said; I do have a follow up question on that code. So now you've
> successfully obtained an address in module space; but the moment you
> release that RCU-sched lock, the module can be gone.
> 
> How does the whole live patching stuff deal with module removal during
> patching?

There is a notifier, see klp_module_notify(). It applies existing
patches when an affected module is loaded. Also it removes patches
when an affected module is going. It is serialized with the other
operations using the klp_mutex lock.

Hmm, when I think about it. I am afraid that there is a race. For
example, the going module might be unpatched by the notifier but
a new patch might get applied when it is still visible by kallsyms.
I am going to look at it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
  2015-03-02  8:23     ` Peter Zijlstra
@ 2015-03-02  9:53       ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-02  9:53 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Andrea Arcangeli, David Woodhouse,
	Rik van Riel, Josh Triplett

On Mon, Mar 02, 2015 at 09:23:45AM +0100, Peter Zijlstra wrote:
> It changes the rb-tree from using internal storage like we do now, to
> requiring external storage.

Note that one consequence of using external storage is that tree
iteration will always require two cachelines per level. One to
load the node and one to load the search key.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 1/9] klp: Fix obvious RCU fail
  2015-03-02  9:13       ` Jiri Kosina
@ 2015-03-02 10:00         ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-02 10:00 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Seth Jennings, Josh Poimboeuf,
	Masami Hiramatsu, Miroslav Benes, Petr Mladek

On Mon, Mar 02, 2015 at 10:13:05AM +0100, Jiri Kosina wrote:
> On Mon, 2 Mar 2015, Peter Zijlstra wrote:

> > That said; I do have a follow up question on that code. So now you've
> > successfully obtained an address in module space; but the moment you
> > release that RCU-sched lock, the module can be gone.
> > 
> > How does the whole live patching stuff deal with module removal during
> > patching?
> 
> We have a module notifier that serializes using klp_mutex with the 
> patching. Do you see a problem there?

Should work I think; might be worth noting this in a comment above this
lookup though.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 2/9] module: Sanitize RCU usage and locking
  2015-02-28 21:24 ` [RFC][PATCH 2/9] module: Sanitize RCU usage and locking Peter Zijlstra
@ 2015-03-02 11:16   ` Rusty Russell
  2015-03-02 12:37     ` Peter Zijlstra
  2015-03-02 19:37   ` Paul E. McKenney
  1 sibling, 1 reply; 39+ messages in thread
From: Rusty Russell @ 2015-03-02 11:16 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, mathieu.desnoyers, oleg, paulmck,
	Masami Hiramatsu
  Cc: linux-kernel, andi, rostedt, tglx, peterz

Peter Zijlstra <peterz@infradead.org> writes:
> Currently the RCU usage in module is an inconsistent mess of RCU and
> RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
> does not imply synchronize_sched().

Huh?  It's not "an inconsistent mess".  They're all synchronize_rcu(),
except one.

That one is *specifically* a best effort bandaid for the case where
module initialization has failed.  It's theoretically racy, so we wait a
bit before freeing.

That said, I love the new checks, thanks!

> +static inline void module_assert_mutex(void)
> +{
> +	lockdep_assert_held(&module_mutex);
> +}
> +
> +static inline void module_assert_mutex_or_preempt(void)
> +{
> +#ifdef CONFIG_LOCKDEP
> +	int rcu_held = rcu_read_lock_sched_held();
> +	int mutex_held = 1;
> +
> +	if (debug_locks)
> +		mutex_held = lockdep_is_held(&module_mutex);
> +
> +	WARN_ON(!rcu_held && !mutex_held);
> +#endif
> +}

Minor nitpick: I generally avoid static inline in C files (unless
functions are unused under some config options, which these aren't).

In general, they mess up future cleanups, as gcc doesn't warn about
unused functions.

Thanks,
Rusty.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 2/9] module: Sanitize RCU usage and locking
  2015-03-02 11:16   ` Rusty Russell
@ 2015-03-02 12:37     ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-02 12:37 UTC (permalink / raw)
  To: Rusty Russell
  Cc: mingo, mathieu.desnoyers, oleg, paulmck, Masami Hiramatsu,
	linux-kernel, andi, rostedt, tglx

On Mon, Mar 02, 2015 at 09:46:45PM +1030, Rusty Russell wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> > Currently the RCU usage in module is an inconsistent mess of RCU and
> > RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
> > does not imply synchronize_sched().
> 
> Huh?  It's not "an inconsistent mess".  They're all synchronize_rcu(),
> except one.

Uhm, most of them use preempt_disable(), which is RCU-sched, not RCU.

The only RCU user I found was the bug-list thing.

What other RCU users are there?

> That said, I love the new checks, thanks!
> 
> > +static inline void module_assert_mutex(void)
> > +{
> > +	lockdep_assert_held(&module_mutex);
> > +}
> > +
> > +static inline void module_assert_mutex_or_preempt(void)
> > +{
> > +#ifdef CONFIG_LOCKDEP
> > +	int rcu_held = rcu_read_lock_sched_held();
> > +	int mutex_held = 1;
> > +
> > +	if (debug_locks)
> > +		mutex_held = lockdep_is_held(&module_mutex);
> > +
> > +	WARN_ON(!rcu_held && !mutex_held);
> > +#endif
> > +}
> 
> Minor nitpick: I generally avoid static inline in C files (unless
> functions are unused under some config options, which these aren't).
> 
> In general, they mess up future cleanups, as gcc doesn't warn about
> unused functions.

Ah, sure. And I suppose gcc will not emit code for empty static
functions anyhow - which is the reason I stuck the inline on, to avoid
it generating code for the !LOCKDEP case.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 1/9] klp: Fix obvious RCU fail
  2015-02-28 21:24 ` [RFC][PATCH 1/9] klp: Fix obvious RCU fail Peter Zijlstra
  2015-03-01 20:09   ` Jiri Kosina
  2015-03-02  1:31   ` Masami Hiramatsu
@ 2015-03-02 19:21   ` Paul E. McKenney
  2015-03-02 21:07   ` Josh Poimboeuf
  3 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2015-03-02 19:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, linux-kernel, andi,
	rostedt, tglx, Seth Jennings, Josh Poimboeuf, Masami Hiramatsu,
	Miroslav Benes, Petr Mladek, Jiri Kosina

On Sat, Feb 28, 2015 at 10:24:48PM +0100, Peter Zijlstra wrote:
> While one must hold RCU-sched (aka. preempt_disable) for find_symbol()
> one must equally hold it over the use of the object returned.
> 
> The moment you release the RCU-sched read lock, the object can be dead
> and gone.
> 
> Cc: Seth Jennings <sjenning@redhat.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Miroslav Benes <mbenes@suse.cz>
> Cc: Petr Mladek <pmladek@suse.cz>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  kernel/livepatch/core.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -248,11 +248,12 @@ static int klp_find_external_symbol(stru
>  	/* first, check if it's an exported symbol */
>  	preempt_disable();
>  	sym = find_symbol(name, NULL, NULL, true, true);
> -	preempt_enable();
>  	if (sym) {
>  		*addr = sym->value;
> +		preempt_enable();
>  		return 0;
>  	}
> +	preempt_enable();
> 
>  	/* otherwise check if it's in another .o within the patch module */
>  	return klp_find_object_symbol(pmod->name, name, addr);
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 2/9] module: Sanitize RCU usage and locking
  2015-02-28 21:24 ` [RFC][PATCH 2/9] module: Sanitize RCU usage and locking Peter Zijlstra
  2015-03-02 11:16   ` Rusty Russell
@ 2015-03-02 19:37   ` Paul E. McKenney
  2015-03-17 17:13     ` Peter Zijlstra
  1 sibling, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2015-03-02 19:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, linux-kernel, andi, rostedt, tglx

On Sat, Feb 28, 2015 at 10:24:49PM +0100, Peter Zijlstra wrote:
> Currently the RCU usage in module is an inconsistent mess of RCU and
> RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
> does not imply synchronize_sched().
> 
> Convert everything over to RCU-sched.
> 
> Furthermore add lockdep asserts to all sites, because its not at all
> clear to me the required locking is observed, esp. on exported
> functions.
> 
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Looks good -- just a couple of questions below.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  include/linux/module.h |   12 ++++++++++--
>  kernel/module.c        |   42 ++++++++++++++++++++++++++++++++++--------
>  lib/bug.c              |    7 +++++--
>  3 files changed, 49 insertions(+), 12 deletions(-)
> 
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -415,14 +415,22 @@ struct symsearch {
>  	bool unused;
>  };
> 
> -/* Search for an exported symbol by name. */
> +/*
> + * Search for an exported symbol by name.
> + *
> + * Must be called with module_mutex held or preemption disabled.
> + */
>  const struct kernel_symbol *find_symbol(const char *name,
>  					struct module **owner,
>  					const unsigned long **crc,
>  					bool gplok,
>  					bool warn);
> 
> -/* Walk the exported symbol table */
> +/*
> + * Walk the exported symbol table
> + *
> + * Must be called with module_mutex held or preemption disabled.
> + */
>  bool each_symbol_section(bool (*fn)(const struct symsearch *arr,
>  				    struct module *owner,
>  				    void *data), void *data);
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -106,6 +106,24 @@ static LIST_HEAD(modules);
>  struct list_head *kdb_modules = &modules; /* kdb needs the list of modules */
>  #endif /* CONFIG_KGDB_KDB */
> 
> +static inline void module_assert_mutex(void)
> +{
> +	lockdep_assert_held(&module_mutex);
> +}
> +
> +static inline void module_assert_mutex_or_preempt(void)
> +{
> +#ifdef CONFIG_LOCKDEP
> +	int rcu_held = rcu_read_lock_sched_held();
> +	int mutex_held = 1;
> +
> +	if (debug_locks)
> +		mutex_held = lockdep_is_held(&module_mutex);
> +
> +	WARN_ON(!rcu_held && !mutex_held);
> +#endif
> +}
> +
>  #ifdef CONFIG_MODULE_SIG
>  #ifdef CONFIG_MODULE_SIG_FORCE
>  static bool sig_enforce = true;
> @@ -319,6 +337,8 @@ bool each_symbol_section(bool (*fn)(cons
>  #endif
>  	};
> 
> +	module_assert_mutex_or_preempt();
> +
>  	if (each_symbol_in_section(arr, ARRAY_SIZE(arr), NULL, fn, data))
>  		return true;
> 
> @@ -458,6 +478,8 @@ static struct module *find_module_all(co
>  {
>  	struct module *mod;
> 
> +	module_assert_mutex();
> +
>  	list_for_each_entry(mod, &modules, list) {
>  		if (!even_unformed && mod->state == MODULE_STATE_UNFORMED)
>  			continue;
> @@ -1856,8 +1878,8 @@ static void free_module(struct module *m
>  	list_del_rcu(&mod->list);
>  	/* Remove this module from bug list, this uses list_del_rcu */
>  	module_bug_cleanup(mod);
> -	/* Wait for RCU synchronizing before releasing mod->list and buglist. */
> -	synchronize_rcu();
> +	/* Wait for RCU-sched synchronizing before releasing mod->list and buglist. */
> +	synchronize_sched();
>  	mutex_unlock(&module_mutex);

There might be some reasons why the synchronize_sched() needs to be
within the module_mutex critical section, but safe freeing of memory
and the integrity of the list are not two of them.  Doesn't hurt anything
unless someone is being very active with their modules, of course.

> 
>  	/* This may be NULL, but that's OK */
> @@ -3106,11 +3128,11 @@ static noinline int do_init_module(struc
>  	mod->init_text_size = 0;
>  	/*
>  	 * We want to free module_init, but be aware that kallsyms may be
> -	 * walking this with preempt disabled.  In all the failure paths,
> -	 * we call synchronize_rcu/synchronize_sched, but we don't want
> -	 * to slow down the success path, so use actual RCU here.
> +	 * walking this with preempt disabled.  In all the failure paths, we
> +	 * call synchronize_sched, but we don't want to slow down the success
> +	 * path, so use actual RCU here.
>  	 */
> -	call_rcu(&freeinit->rcu, do_free_init);
> +	call_rcu_sched(&freeinit->rcu, do_free_init);
>  	mutex_unlock(&module_mutex);
>  	wake_up_all(&module_wq);
> 
> @@ -3368,8 +3390,8 @@ static int load_module(struct load_info
>  	/* Unlink carefully: kallsyms could be walking list. */
>  	list_del_rcu(&mod->list);
>  	wake_up_all(&module_wq);
> -	/* Wait for RCU synchronizing before releasing mod->list. */
> -	synchronize_rcu();
> +	/* Wait for RCU-sched synchronizing before releasing mod->list. */
> +	synchronize_sched();
>  	mutex_unlock(&module_mutex);

Same here.

>   free_module:
>  	/* Free lock-classes; relies on the preceding sync_rcu() */
> @@ -3636,6 +3658,8 @@ int module_kallsyms_on_each_symbol(int (
>  	unsigned int i;
>  	int ret;
> 
> +	module_assert_mutex();
> +
>  	list_for_each_entry(mod, &modules, list) {
>  		if (mod->state == MODULE_STATE_UNFORMED)
>  			continue;
> @@ -3810,6 +3834,8 @@ struct module *__module_address(unsigned
>  	if (addr < module_addr_min || addr > module_addr_max)
>  		return NULL;
> 
> +	module_assert_mutex_or_preempt();
> +
>  	list_for_each_entry_rcu(mod, &modules, list) {
>  		if (mod->state == MODULE_STATE_UNFORMED)
>  			continue;
> --- a/lib/bug.c
> +++ b/lib/bug.c
> @@ -66,7 +66,7 @@ static const struct bug_entry *module_fi
>  	struct module *mod;
>  	const struct bug_entry *bug = NULL;
> 
> -	rcu_read_lock();
> +	rcu_read_lock_sched();
>  	list_for_each_entry_rcu(mod, &module_bug_list, bug_list) {
>  		unsigned i;
> 
> @@ -77,7 +77,7 @@ static const struct bug_entry *module_fi
>  	}
>  	bug = NULL;
>  out:
> -	rcu_read_unlock();
> +	rcu_read_unlock_sched();
> 
>  	return bug;
>  }
> @@ -88,6 +88,8 @@ void module_bug_finalize(const Elf_Ehdr
>  	char *secstrings;
>  	unsigned int i;
> 
> +	lockdep_assert_held(&module_mutex);
> +
>  	mod->bug_table = NULL;
>  	mod->num_bugs = 0;
> 
> @@ -113,6 +115,7 @@ void module_bug_finalize(const Elf_Ehdr
> 
>  void module_bug_cleanup(struct module *mod)
>  {
> +	lockdep_assert_held(&module_mutex);
>  	list_del_rcu(&mod->bug_list);
>  }
> 
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 3/9] module: Annotate module version magic
  2015-02-28 21:24 ` [RFC][PATCH 3/9] module: Annotate module version magic Peter Zijlstra
@ 2015-03-02 19:38   ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2015-03-02 19:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, linux-kernel, andi, rostedt, tglx

On Sat, Feb 28, 2015 at 10:24:50PM +0100, Peter Zijlstra wrote:
> Due to the new lockdep checks we go:
> 
> [    9.759380] ------------[ cut here ]------------
> [    9.759389] WARNING: CPU: 31 PID: 597 at ../kernel/module.c:216 each_symbol_section+0x121/0x130()
> [    9.759391] Modules linked in:
> [    9.759393] CPU: 31 PID: 597 Comm: modprobe Not tainted 4.0.0-rc1+ #65
> [    9.759393] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [    9.759396]  ffffffff817d8676 ffff880424567ca8 ffffffff8157e98b 0000000000000001
> [    9.759398]  0000000000000000 ffff880424567ce8 ffffffff8105fbc7 ffff880424567cd8
> [    9.759400]  0000000000000000 ffffffff810ec160 ffff880424567d40 0000000000000000
> [    9.759400] Call Trace:
> [    9.759407]  [<ffffffff8157e98b>] dump_stack+0x4f/0x7b
> [    9.759410]  [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0
> [    9.759412]  [<ffffffff810ec160>] ? section_objs+0x60/0x60
> [    9.759414]  [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20
> [    9.759415]  [<ffffffff810ed9c1>] each_symbol_section+0x121/0x130
> [    9.759417]  [<ffffffff810eda01>] find_symbol+0x31/0x70
> [    9.759420]  [<ffffffff810ef5bf>] load_module+0x20f/0x2660
> [    9.759422]  [<ffffffff8104ef10>] ? __do_page_fault+0x190/0x4e0
> [    9.759426]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
> [    9.759427]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
> [    9.759433]  [<ffffffff810ae73d>] ? trace_hardirqs_on_caller+0x11d/0x1e0
> [    9.759437]  [<ffffffff812fcc0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [    9.759439]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
> [    9.759441]  [<ffffffff810f1ade>] SyS_init_module+0xce/0x100
> [    9.759443]  [<ffffffff81587429>] system_call_fastpath+0x12/0x17
> [    9.759445] ---[ end trace 9294429076a9c644 ]---
> 
> As per the comment this site should be fine, but lets wrap it in
> preempt_disable() anyhow to placate lockdep.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  kernel/module.c |   12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1192,11 +1192,17 @@ static inline int check_modstruct_versio
>  {
>  	const unsigned long *crc;
> 
> -	/* Since this should be found in kernel (which can't be removed),
> -	 * no locking is necessary. */
> +	/*
> +	 * Since this should be found in kernel (which can't be removed),
> +	 * no locking is necessary.
> +	 */
> +	preempt_disable();
>  	if (!find_symbol(VMLINUX_SYMBOL_STR(module_layout), NULL,
> -			 &crc, true, false))
> +			 &crc, true, false)) {
> +		preempt_enable();
>  		BUG();
> +	}
> +	preempt_enable();
>  	return check_version(sechdrs, versindex,
>  			     VMLINUX_SYMBOL_STR(module_layout), mod, crc,
>  			     NULL);
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 4/9] module, jump_label: Fix module locking
  2015-02-28 21:24 ` [RFC][PATCH 4/9] module, jump_label: Fix module locking Peter Zijlstra
@ 2015-03-02 19:39   ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2015-03-02 19:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, linux-kernel, andi,
	rostedt, tglx, Jason Baron

On Sat, Feb 28, 2015 at 10:24:51PM +0100, Peter Zijlstra wrote:
> As per the module core lockdep annotations:
> 
> [   18.034047] ---[ end trace 9294429076a9c673 ]---
> [   18.047760] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [   18.059228]  ffffffff817d8676 ffff880036683c38 ffffffff8157e98b 0000000000000001
> [   18.067541]  0000000000000000 ffff880036683c78 ffffffff8105fbc7 ffff880036683c68
> [   18.075851]  ffffffffa0046b08 0000000000000000 ffffffffa0046d00 ffffffffa0046cc8
> [   18.084173] Call Trace:
> [   18.086906]  [<ffffffff8157e98b>] dump_stack+0x4f/0x7b
> [   18.092649]  [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0
> [   18.099361]  [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20
> [   18.105880]  [<ffffffff810ee502>] __module_address+0x1d2/0x1e0
> [   18.112400]  [<ffffffff81161153>] jump_label_module_notify+0x143/0x1e0
> [   18.119710]  [<ffffffff810814bf>] notifier_call_chain+0x4f/0x70
> [   18.126326]  [<ffffffff8108160e>] __blocking_notifier_call_chain+0x5e/0x90
> [   18.134009]  [<ffffffff81081656>] blocking_notifier_call_chain+0x16/0x20
> [   18.141490]  [<ffffffff810f0f00>] load_module+0x1b50/0x2660
> [   18.147720]  [<ffffffff810f1ade>] SyS_init_module+0xce/0x100
> [   18.154045]  [<ffffffff81587429>] system_call_fastpath+0x12/0x17
> [   18.160748] ---[ end trace 9294429076a9c674 ]---
> 
> Jump labels is not doing it right; fix this.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Jason Baron <jbaron@akamai.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  kernel/jump_label.c |   21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -280,6 +280,17 @@ void jump_label_apply_nops(struct module
>  	}
>  }
> 
> +static inline bool address_in_module(unsigned long addr, struct module *mod)
> +{
> +	bool ret;
> +
> +	preempt_disable();
> +	ret = __module_address(addr) == mod;
> +	preempt_enable();
> +
> +	return ret;
> +}
> +
>  static int jump_label_add_module(struct module *mod)
>  {
>  	struct jump_entry *iter_start = mod->jump_entries;
> @@ -302,7 +313,7 @@ static int jump_label_add_module(struct
>  			continue;
> 
>  		key = iterk;
> -		if (__module_address(iter->key) == mod) {
> +		if (address_in_module(iter->key, mod)) {
>  			/*
>  			 * Set key->entries to iter, but preserve JUMP_LABEL_TRUE_BRANCH.
>  			 */
> @@ -339,7 +350,7 @@ static void jump_label_del_module(struct
> 
>  		key = (struct static_key *)(unsigned long)iter->key;
> 
> -		if (__module_address(iter->key) == mod)
> +		if (address_in_module(iter->key, mod))
>  			continue;
> 
>  		prev = &key->next;
> @@ -443,14 +454,16 @@ static void jump_label_update(struct sta
>  {
>  	struct jump_entry *stop = __stop___jump_table;
>  	struct jump_entry *entry = jump_label_get_entries(key);
> -
>  #ifdef CONFIG_MODULES
> -	struct module *mod = __module_address((unsigned long)key);
> +	struct module *mod;
> 
>  	__jump_label_mod_update(key, enable);
> 
> +	preempt_disable();
> +	mod = __module_address((unsigned long)key);
>  	if (mod)
>  		stop = mod->jump_entries + mod->num_jump_entries;
> +	preempt_enable();
>  #endif
>  	/* if there are no users, entry can be NULL */
>  	if (entry)
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch()
  2015-03-02  8:51       ` Peter Zijlstra
@ 2015-03-02 19:46         ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2015-03-02 19:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, mingo, rusty, oleg, linux-kernel, andi,
	rostedt, tglx, Michel Lespinasse, Andrea Arcangeli,
	David Woodhouse, Rik van Riel

On Mon, Mar 02, 2015 at 09:51:16AM +0100, Peter Zijlstra wrote:
> On Mon, Mar 02, 2015 at 09:33:20AM +0100, Peter Zijlstra wrote:
> > On Sun, Mar 01, 2015 at 02:02:23PM +0000, Mathieu Desnoyers wrote:
> 
> > > The latch, AFAIU, takes care of making sure the new objects are
> > > initialized before being published into the data structure, so there
> > > would be no need to use RCU assign pointer. However, we really need
> > > RCU around reads, along with a grace period between removal of an object
> > > and its teardown.
> > 
> > So I do need the rcu_assign_pointer for the RB link because that also
> > initializes the rb_node itself. Or put differently, be _very_ _VERY_
> > sure your entire object is initialized before the latch.
> > 
> > Secondly, note that the latch does a WMB and rcu_assign_pointer does a
> > RELEASE, these are not equivalent.
> > 
> > So I don't think I will highlight this particular point. If you're sure
> > enough to know the difference you can get away with it, sure. But in
> > general I think people should still use rcu_assign_pointer; if only to
> > make Paul sleep better at night ;-)

I do appreciate that sentiment.  ;-)

> Also note that if you do not use rcu_assign_pointer() one will at the
> very least require WRITE_ONCE().

And, for ARM, IA64, and PowerPC, memory-barrier instructions.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 7/9] rbtree: Implement generic latch_tree
  2015-02-28 21:24 ` [RFC][PATCH 7/9] rbtree: Implement generic latch_tree Peter Zijlstra
  2015-03-01 21:17   ` Michel Lespinasse
@ 2015-03-02 19:53   ` Paul E. McKenney
  2015-03-17 17:24     ` Peter Zijlstra
  1 sibling, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2015-03-02 19:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, linux-kernel, andi,
	rostedt, tglx, Michel Lespinasse, Andrea Arcangeli,
	David Woodhouse, Rik van Riel

On Sat, Feb 28, 2015 at 10:24:54PM +0100, Peter Zijlstra wrote:
> Implement a latched RB-tree in order to get RCU style lookups.
> 
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: David Woodhouse <David.Woodhouse@intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

The caller of latch_tree_erase() is required to wait for a grace period
before freeing the erased nodes?  Or am I missing something subtle here?

							Thanx, Paul

> ---
>  include/linux/rbtree_latch.h |  140 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 140 insertions(+)
> 
> --- /dev/null
> +++ b/include/linux/rbtree_latch.h
> @@ -0,0 +1,140 @@
> +/*
> + * Latched RB-trees
> + *
> + * Copyright (C) 2015 Intel Corp., Peter Zijlstra <peterz@infradead.org>
> + */
> +
> +#ifndef RB_TREE_LATCH_H
> +#define RB_TREE_LATCH_H
> +
> +#include <linux/rbtree.h>
> +#include <linux/seqlock.h>
> +
> +/*
> + * Since RB-trees have non atomic modifications they're not suited for
> + * RCU/lockless queries.
> + *
> + * Employ the latch technique -- see @raw_write_seqcount_latch -- to implement
> + * a latched RB-tree which does allow this by virtue of always having (at
> + * least) one stable copy of the tree.
> + *
> + * However, while we have the guarantee that there is at all times one stable
> + * copy, this does not guarantee an iteration will not observe modifications.
> + * What might have been a stable copy at the start of the iteration, need not
> + * remain so for the duration of the iteration.
> + *
> + * Therefore, this does require a lockless RB-tree iteration to be non-fatal in
> + * all circumstances; see the comment in lib/rbtree.c.
> + */
> +
> +struct latch_tree_node {
> +	void		*priv;
> +	struct rb_node	node;
> +};
> +
> +struct latch_tree_nodes {
> +	struct latch_tree_node node[2];
> +};
> +
> +struct latch_tree_root {
> +	seqcount_t	seq;
> +	struct rb_root	tree[2];
> +};
> +
> +struct latch_tree_ops {
> +	bool (*less)(struct latch_tree_node *a, struct latch_tree_node *b);
> +	int  (*comp)(void *key,                 struct latch_tree_node *b);
> +};
> +
> +static __always_inline void
> +__lt_insert(struct latch_tree_node *ltn, struct rb_root *root,
> +	    bool (*less)(struct latch_tree_node *a, struct latch_tree_node *b))
> +{
> +	struct rb_node **link = &root->rb_node;
> +	struct rb_node *parent = NULL;
> +	struct latch_tree_node *ltp;
> +
> +	while (*link) {
> +		parent = *link;
> +		ltp = container_of(parent, struct latch_tree_node, node);
> +
> +		if (less(ltn, ltp))
> +			link = &parent->rb_left;
> +		else
> +			link = &parent->rb_right;
> +	}
> +
> +	rb_link_node_rcu(&ltn->node, parent, link);
> +	rb_insert_color(&ltn->node, root);
> +}
> +
> +static __always_inline void
> +__lt_erase(struct latch_tree_node *ltn, struct rb_root *root)
> +{
> +	rb_erase(&ltn->node, root);
> +}
> +
> +static __always_inline struct latch_tree_node *
> +__lt_find(void *key, struct rb_root *root,
> +	  int (*comp)(void *key, struct latch_tree_node *ltn))
> +{
> +	struct rb_node *n = rcu_dereference_raw(root->rb_node);
> +	struct latch_tree_node *ltn;
> +	int c;
> +
> +	while (n) {
> +		ltn = container_of(n, struct latch_tree_node, node);
> +		c = comp(key, ltn);
> +
> +		if (c < 0)
> +			n = rcu_dereference_raw(n->rb_left);
> +		else if (c > 0)
> +			n = rcu_dereference_raw(n->rb_right);
> +		else
> +			return ltn;
> +	}
> +
> +	return NULL;
> +}
> +
> +static __always_inline void
> +latch_tree_insert(struct latch_tree_nodes *nodes,
> +		  struct latch_tree_root *root,
> +		  void *priv,
> +		  const struct latch_tree_ops *ops)
> +{
> +	nodes->node[0].priv = nodes->node[1].priv = priv;
> +
> +	raw_write_seqcount_latch(&root->seq);
> +	__lt_insert(&nodes->node[0], &root->tree[0], ops->less);
> +	raw_write_seqcount_latch(&root->seq);
> +	__lt_insert(&nodes->node[1], &root->tree[1], ops->less);
> +}
> +
> +static __always_inline void
> +latch_tree_erase(struct latch_tree_nodes *nodes,
> +		 struct latch_tree_root *root,
> +		 const struct latch_tree_ops *ops)
> +{
> +	raw_write_seqcount_latch(&root->seq);
> +	__lt_erase(&nodes->node[0], &root->tree[0]);
> +	raw_write_seqcount_latch(&root->seq);
> +	__lt_erase(&nodes->node[1], &root->tree[1]);
> +}
> +
> +static __always_inline struct latch_tree_node *
> +latch_tree_find(void *key, struct latch_tree_root *root,
> +		const struct latch_tree_ops *ops)
> +{
> +	struct latch_tree_node *node;
> +	unsigned int seq;
> +
> +	do {
> +		seq = raw_read_seqcount(&root->seq);
> +		node = __lt_find(key, &root->tree[seq & 1], ops->comp);
> +	} while (read_seqcount_retry(&root->seq, seq));
> +
> +	return node;
> +}
> +
> +#endif /* RB_TREE_LATCH_H */
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 1/9] klp: Fix obvious RCU fail
  2015-02-28 21:24 ` [RFC][PATCH 1/9] klp: Fix obvious RCU fail Peter Zijlstra
                     ` (2 preceding siblings ...)
  2015-03-02 19:21   ` Paul E. McKenney
@ 2015-03-02 21:07   ` Josh Poimboeuf
  3 siblings, 0 replies; 39+ messages in thread
From: Josh Poimboeuf @ 2015-03-02 21:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rusty, mathieu.desnoyers, oleg, paulmck, linux-kernel,
	andi, rostedt, tglx, Seth Jennings, Masami Hiramatsu,
	Miroslav Benes, Petr Mladek, Jiri Kosina

On Sat, Feb 28, 2015 at 10:24:48PM +0100, Peter Zijlstra wrote:
> While one must hold RCU-sched (aka. preempt_disable) for find_symbol()
> one must equally hold it over the use of the object returned.
> 
> The moment you release the RCU-sched read lock, the object can be dead
> and gone.
> 
> Cc: Seth Jennings <sjenning@redhat.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Miroslav Benes <mbenes@suse.cz>
> Cc: Petr Mladek <pmladek@suse.cz>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

> ---
>  kernel/livepatch/core.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -248,11 +248,12 @@ static int klp_find_external_symbol(stru
>  	/* first, check if it's an exported symbol */
>  	preempt_disable();
>  	sym = find_symbol(name, NULL, NULL, true, true);
> -	preempt_enable();
>  	if (sym) {
>  		*addr = sym->value;
> +		preempt_enable();
>  		return 0;
>  	}
> +	preempt_enable();
>  
>  	/* otherwise check if it's in another .o within the patch module */
>  	return klp_find_object_symbol(pmod->name, name, addr);
> 
> 

-- 
Josh

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 2/9] module: Sanitize RCU usage and locking
  2015-03-02 19:37   ` Paul E. McKenney
@ 2015-03-17 17:13     ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-17 17:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: mingo, rusty, mathieu.desnoyers, oleg, linux-kernel, andi, rostedt, tglx

On Mon, Mar 02, 2015 at 11:37:06AM -0800, Paul E. McKenney wrote:
> > -	/* Wait for RCU synchronizing before releasing mod->list and buglist. */
> > -	synchronize_rcu();
> > +	/* Wait for RCU-sched synchronizing before releasing mod->list and buglist. */
> > +	synchronize_sched();
> >  	mutex_unlock(&module_mutex);
> 
> There might be some reasons why the synchronize_sched() needs to be
> within the module_mutex critical section, but safe freeing of memory
> and the integrity of the list are not two of them.  Doesn't hurt anything
> unless someone is being very active with their modules, of course.

Yeah, I could not find anything that relies on that; but I'm not too
familiar with all the module stuff -- although a lot more now that I was
a few weeks back ;-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 7/9] rbtree: Implement generic latch_tree
  2015-03-02 19:53   ` Paul E. McKenney
@ 2015-03-17 17:24     ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-03-17 17:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: mingo, rusty, mathieu.desnoyers, oleg, linux-kernel, andi,
	rostedt, tglx, Michel Lespinasse, Andrea Arcangeli,
	David Woodhouse, Rik van Riel

On Mon, Mar 02, 2015 at 11:53:32AM -0800, Paul E. McKenney wrote:
> On Sat, Feb 28, 2015 at 10:24:54PM +0100, Peter Zijlstra wrote:
> > Implement a latched RB-tree in order to get RCU style lookups.
> > 
> > Cc: Michel Lespinasse <walken@google.com>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: David Woodhouse <David.Woodhouse@intel.com>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> The caller of latch_tree_erase() is required to wait for a grace period
> before freeing the erased nodes?  Or am I missing something subtle here?

Correct; let me clarify this.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-03-17 17:24 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
2015-02-28 21:24 ` [RFC][PATCH 1/9] klp: Fix obvious RCU fail Peter Zijlstra
2015-03-01 20:09   ` Jiri Kosina
2015-03-02  8:35     ` Peter Zijlstra
2015-03-02  9:13       ` Jiri Kosina
2015-03-02 10:00         ` Peter Zijlstra
2015-03-02  9:21       ` Petr Mladek
2015-03-02  1:31   ` Masami Hiramatsu
2015-03-02 19:21   ` Paul E. McKenney
2015-03-02 21:07   ` Josh Poimboeuf
2015-02-28 21:24 ` [RFC][PATCH 2/9] module: Sanitize RCU usage and locking Peter Zijlstra
2015-03-02 11:16   ` Rusty Russell
2015-03-02 12:37     ` Peter Zijlstra
2015-03-02 19:37   ` Paul E. McKenney
2015-03-17 17:13     ` Peter Zijlstra
2015-02-28 21:24 ` [RFC][PATCH 3/9] module: Annotate module version magic Peter Zijlstra
2015-03-02 19:38   ` Paul E. McKenney
2015-02-28 21:24 ` [RFC][PATCH 4/9] module, jump_label: Fix module locking Peter Zijlstra
2015-03-02 19:39   ` Paul E. McKenney
2015-02-28 21:24 ` [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal Peter Zijlstra
2015-03-01 13:52   ` Mathieu Desnoyers
2015-03-02  8:27     ` Peter Zijlstra
2015-03-01 21:11   ` Michel Lespinasse
2015-03-02  7:46     ` Ingo Molnar
2015-03-02  8:23     ` Peter Zijlstra
2015-03-02  9:53       ` Peter Zijlstra
2015-02-28 21:24 ` [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch() Peter Zijlstra
2015-03-01 14:02   ` Mathieu Desnoyers
2015-03-02  8:33     ` Peter Zijlstra
2015-03-02  8:51       ` Peter Zijlstra
2015-03-02 19:46         ` Paul E. McKenney
2015-03-01 21:12   ` Michel Lespinasse
2015-02-28 21:24 ` [RFC][PATCH 7/9] rbtree: Implement generic latch_tree Peter Zijlstra
2015-03-01 21:17   ` Michel Lespinasse
2015-03-02  8:05     ` Peter Zijlstra
2015-03-02 19:53   ` Paul E. McKenney
2015-03-17 17:24     ` Peter Zijlstra
2015-02-28 21:24 ` [RFC][PATCH 8/9] module: Optimize __module_address() using a latched RB-tree Peter Zijlstra
2015-02-28 21:24 ` [RFC][PATCH 9/9] module: Use __module_address() for module_address_lookup() Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).