All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] jump label: 2.6.38 updates
@ 2011-01-05 15:43 Jason Baron
  2011-01-05 15:43 ` [PATCH 1/2] jump label: make enable/disable o(1) Jason Baron
                   ` (2 more replies)
  0 siblings, 3 replies; 113+ messages in thread
From: Jason Baron @ 2011-01-05 15:43 UTC (permalink / raw)
  To: peterz, mathieu.desnoyers, hpa, rostedt, mingo
  Cc: tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi,
	davem, sam, ddaney, michael, linux-kernel

Hi,

The first patch uses the storage space of the jump label key address
as a pointer into the update table. In this way, we can find all
the addresses that need to be updated without hashing.

The second patch introduces:

static __always_inline bool static_branch(struct jump_label_key *key);

instead of the old JUMP_LABEL(key, label) macro.

In this way, jump labels become really easy to use:

Define:

        struct jump_label_key jump_key;

Can be used as:

        if (static_branch(&jump_key))
                do unlikely code

enable/disale via:

        jump_label_enable(&jump_key);
        jump_label_disable(&jump_key);

that's it!

For perf, which also uses jump labels, I've left the reference counting
out of the jump label layer, thus removing the 'jump_label_inc()' and
'jump_label_dec()' interface. Hopefully, this is a more palatable solution.

Thanks to H. Peter Anvin for suggesting the simpler 'static_branch()'
function.

thanks,

-Jason




Jason Baron (2):
  jump label: make enable/disable o(1)
  jump label: introduce unlikely_switch()

 arch/sparc/include/asm/jump_label.h |   25 ++++---
 arch/x86/include/asm/jump_label.h   |   22 ++++---
 arch/x86/kernel/jump_label.c        |    2 +-
 include/linux/dynamic_debug.h       |   24 ++-----
 include/linux/jump_label.h          |   66 ++++++++++--------
 include/linux/jump_label_ref.h      |   36 +++-------
 include/linux/perf_event.h          |   28 ++++----
 include/linux/tracepoint.h          |    8 +--
 kernel/jump_label.c                 |  129 +++++++++++++++++++++++++++--------
 kernel/perf_event.c                 |   24 ++++--
 kernel/tracepoint.c                 |   22 ++----
 11 files changed, 226 insertions(+), 160 deletions(-)


^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 1/2] jump label: make enable/disable o(1)
  2011-01-05 15:43 [PATCH 0/2] jump label: 2.6.38 updates Jason Baron
@ 2011-01-05 15:43 ` Jason Baron
  2011-01-05 17:31   ` Steven Rostedt
  2011-01-05 15:43 ` [PATCH 2/2] jump label: introduce static_branch() Jason Baron
  2011-02-11 19:25 ` [PATCH 0/2] jump label: 2.6.38 updates Peter Zijlstra
  2 siblings, 1 reply; 113+ messages in thread
From: Jason Baron @ 2011-01-05 15:43 UTC (permalink / raw)
  To: peterz, mathieu.desnoyers, hpa, rostedt, mingo
  Cc: tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi,
	davem, sam, ddaney, michael, linux-kernel

Previously, I allowed any variable type to be used as the 'key' for
the jump label. However, by enforcing a type, we can make use of the
contents of the 'key'. This patch thus introduces:

struct jump_label_key {
       void *ptr;
};

The 'ptr' is used a pointer into the jump label table of the
corresponding addresses that need to be updated. Thus, when jump labels
are enabled/disabled we have a constant time algorithm. There is no
longer any hashing.

When jump lables are disabled we simply have:

struct jump_label_key {
        int state;
};

I tested enable/disable times on x86 on a quad core via:

 time echo 1 > /sys/kernel/debug/tracing/events/enable

With this patch, runs average .03s. Prior to the jump label infrastructure
this command averaged around .01s.

We can speed this path up further via batching the enable/disables.

thanks,

-Jason

Signed-off-by: Jason Baron <jbaron@redhat.com>
---
 include/linux/dynamic_debug.h  |    6 +-
 include/linux/jump_label.h     |   46 +++++++++-----
 include/linux/jump_label_ref.h |   34 +++--------
 include/linux/perf_event.h     |    8 ++-
 include/linux/tracepoint.h     |    6 +-
 kernel/jump_label.c            |  127 +++++++++++++++++++++++++++++++---------
 kernel/perf_event.c            |   24 +++++---
 kernel/tracepoint.c            |   22 +++-----
 8 files changed, 172 insertions(+), 101 deletions(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index a90b389..ddf7bae 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -33,7 +33,7 @@ struct _ddebug {
 #define _DPRINTK_FLAGS_PRINT   (1<<0)  /* printk() a message using the format */
 #define _DPRINTK_FLAGS_DEFAULT 0
 	unsigned int flags:8;
-	char enabled;
+	struct jump_label_key enabled;
 } __attribute__((aligned(8)));
 
 
@@ -50,7 +50,7 @@ extern int ddebug_remove_module(const char *mod_name);
 	__used								\
 	__attribute__((section("__verbose"), aligned(8))) =		\
 	{ KBUILD_MODNAME, __func__, __FILE__, fmt, __LINE__,		\
-		_DPRINTK_FLAGS_DEFAULT };				\
+		_DPRINTK_FLAGS_DEFAULT, JUMP_LABEL_INIT };		\
 	JUMP_LABEL(&descriptor.enabled, do_printk);			\
 	goto out;							\
 do_printk:								\
@@ -66,7 +66,7 @@ out:	;								\
 	__used								\
 	__attribute__((section("__verbose"), aligned(8))) =		\
 	{ KBUILD_MODNAME, __func__, __FILE__, fmt, __LINE__,		\
-		_DPRINTK_FLAGS_DEFAULT };				\
+		_DPRINTK_FLAGS_DEFAULT, JUMP_LABEL_INIT };		\
 	JUMP_LABEL(&descriptor.enabled, do_printk);			\
 	goto out;							\
 do_printk:								\
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 7880f18..152f7de 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -2,6 +2,11 @@
 #define _LINUX_JUMP_LABEL_H
 
 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
+
+struct jump_label_key {
+	void *ptr;
+};
+
 # include <asm/jump_label.h>
 # define HAVE_JUMP_LABEL
 #endif
@@ -13,6 +18,8 @@ enum jump_label_type {
 
 struct module;
 
+#define JUMP_LABEL_INIT { 0 }
+
 #ifdef HAVE_JUMP_LABEL
 
 extern struct jump_entry __start___jump_table[];
@@ -23,33 +30,38 @@ extern void jump_label_unlock(void);
 extern void arch_jump_label_transform(struct jump_entry *entry,
 				 enum jump_label_type type);
 extern void arch_jump_label_text_poke_early(jump_label_t addr);
-extern void jump_label_update(unsigned long key, enum jump_label_type type);
 extern void jump_label_apply_nops(struct module *mod);
 extern int jump_label_text_reserved(void *start, void *end);
-
-#define jump_label_enable(key) \
-	jump_label_update((unsigned long)key, JUMP_LABEL_ENABLE);
-
-#define jump_label_disable(key) \
-	jump_label_update((unsigned long)key, JUMP_LABEL_DISABLE);
+extern int jump_label_enabled(struct jump_label_key *key);
+extern void jump_label_enable(struct jump_label_key *key);
+extern void jump_label_disable(struct jump_label_key *key);
 
 #else
 
+struct jump_label_key {
+	int state;
+};
+
 #define JUMP_LABEL(key, label)			\
 do {						\
-	if (unlikely(*key))			\
+	if (unlikely(((struct jump_label_key *)key)->state))		\
 		goto label;			\
 } while (0)
 
-#define jump_label_enable(cond_var)	\
-do {					\
-       *(cond_var) = 1;			\
-} while (0)
+static inline int jump_label_enabled(struct jump_label_key *key)
+{
+	return key->state;
+}
 
-#define jump_label_disable(cond_var)	\
-do {					\
-       *(cond_var) = 0;			\
-} while (0)
+static inline void jump_label_enable(struct jump_label_key *key)
+{
+	key->state = 1;
+}
+
+static inline void jump_label_disable(struct jump_label_key *key)
+{
+	key->state = 0;
+}
 
 static inline int jump_label_apply_nops(struct module *mod)
 {
@@ -69,7 +81,7 @@ static inline void jump_label_unlock(void) {}
 #define COND_STMT(key, stmt)					\
 do {								\
 	__label__ jl_enabled;					\
-	JUMP_LABEL(key, jl_enabled);				\
+	JUMP_LABEL_ELSE_ATOMIC_READ(key, jl_enabled);		\
 	if (0) {						\
 jl_enabled:							\
 		stmt;						\
diff --git a/include/linux/jump_label_ref.h b/include/linux/jump_label_ref.h
index e5d012a..8a76e89 100644
--- a/include/linux/jump_label_ref.h
+++ b/include/linux/jump_label_ref.h
@@ -4,38 +4,20 @@
 #include <linux/jump_label.h>
 #include <asm/atomic.h>
 
-#ifdef HAVE_JUMP_LABEL
-
-static inline void jump_label_inc(atomic_t *key)
-{
-	if (atomic_add_return(1, key) == 1)
-		jump_label_enable(key);
+struct jump_label_key_counter {
+	atomic_t ref;
+	struct jump_label_key key;
 }
 
-static inline void jump_label_dec(atomic_t *key)
-{
-	if (atomic_dec_and_test(key))
-		jump_label_disable(key);
-}
-
-#else /* !HAVE_JUMP_LABEL */
+#ifdef HAVE_JUMP_LABEL
 
-static inline void jump_label_inc(atomic_t *key)
-{
-	atomic_inc(key);
-}
+#define JUMP_LABEL_ELSE_ATOMIC_READ(key, label, counter) JUMP_LABEL(key, label)
 
-static inline void jump_label_dec(atomic_t *key)
-{
-	atomic_dec(key);
-}
+#else /* !HAVE_JUMP_LABEL */
 
-#undef JUMP_LABEL
-#define JUMP_LABEL(key, label)						\
+#define JUMP_LABEL_ELSE_ATOMIC_READ(key, label, counter)		\
 do {									\
-	if (unlikely(__builtin_choose_expr(				\
-	      __builtin_types_compatible_p(typeof(key), atomic_t *),	\
-	      atomic_read((atomic_t *)(key)), *(key))))			\
+	if (unlikely(atomic_read((atomic_t *)counter)))			\
 		goto label;						\
 } while (0)
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index dda5b0a..94834ce 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1000,7 +1000,7 @@ static inline int is_software_event(struct perf_event *event)
 	return event->pmu->task_ctx_nr == perf_sw_context;
 }
 
-extern atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
+extern struct jump_label_key_counter perf_swevent_enabled[PERF_COUNT_SW_MAX];
 
 extern void __perf_sw_event(u32, u64, int, struct pt_regs *, u64);
 
@@ -1029,7 +1029,9 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
 {
 	struct pt_regs hot_regs;
 
-	JUMP_LABEL(&perf_swevent_enabled[event_id], have_event);
+	JUMP_LABEL_ELSE_ATOMIC_READ(&perf_swevent_enabled[event_id].key,
+				    have_event,
+				    &perf_swevent_enabled[event_id].ref);
 	return;
 
 have_event:
@@ -1040,7 +1042,7 @@ have_event:
 	__perf_sw_event(event_id, nr, nmi, regs, addr);
 }
 
-extern atomic_t perf_task_events;
+extern struct jump_label_key_counter perf_task_events;
 
 static inline void perf_event_task_sched_in(struct task_struct *task)
 {
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index d3e4f87..2ff00e5 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -29,7 +29,7 @@ struct tracepoint_func {
 
 struct tracepoint {
 	const char *name;		/* Tracepoint name */
-	int state;			/* State. */
+	struct jump_label_key key;
 	void (*regfunc)(void);
 	void (*unregfunc)(void);
 	struct tracepoint_func *funcs;
@@ -149,7 +149,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
 	extern struct tracepoint __tracepoint_##name;			\
 	static inline void trace_##name(proto)				\
 	{								\
-		JUMP_LABEL(&__tracepoint_##name.state, do_trace);	\
+		JUMP_LABEL(&__tracepoint_##name.key, do_trace);		\
 		return;							\
 do_trace:								\
 			__DO_TRACE(&__tracepoint_##name,		\
@@ -179,7 +179,7 @@ do_trace:								\
 	__attribute__((section("__tracepoints_strings"))) = #name;	\
 	struct tracepoint __tracepoint_##name				\
 	__attribute__((section("__tracepoints"), aligned(32))) =	\
-		{ __tpstrtab_##name, 0, reg, unreg, NULL }
+		{ __tpstrtab_##name, JUMP_LABEL_INIT, reg, unreg, NULL }
 
 #define DEFINE_TRACE(name)						\
 	DEFINE_TRACE_FN(name, NULL, NULL);
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 3b79bd9..b6d461c 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -26,10 +26,11 @@ static DEFINE_MUTEX(jump_label_mutex);
 struct jump_label_entry {
 	struct hlist_node hlist;
 	struct jump_entry *table;
-	int nr_entries;
 	/* hang modules off here */
 	struct hlist_head modules;
 	unsigned long key;
+	u32 nr_entries;
+	int refcount;
 };
 
 struct jump_label_module_entry {
@@ -105,11 +106,14 @@ add_jump_label_entry(jump_label_t key, int nr_entries, struct jump_entry *table)
 
 	hash = jhash((void *)&key, sizeof(jump_label_t), 0);
 	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
-	e->key = key;
+	e->key = (unsigned long)key;
 	e->table = table;
 	e->nr_entries = nr_entries;
+	e->refcount = 0;
 	INIT_HLIST_HEAD(&(e->modules));
 	hlist_add_head(&e->hlist, head);
+	((struct jump_label_key *)(unsigned long)key)->ptr = e;
+
 	return e;
 }
 
@@ -154,37 +158,91 @@ build_jump_label_hashtable(struct jump_entry *start, struct jump_entry *stop)
  *
  */
 
-void jump_label_update(unsigned long key, enum jump_label_type type)
+static void jump_label_update(struct jump_label_entry *entry, enum jump_label_type type)
 {
 	struct jump_entry *iter;
-	struct jump_label_entry *entry;
 	struct hlist_node *module_node;
 	struct jump_label_module_entry *e_module;
 	int count;
 
-	jump_label_lock();
-	entry = get_jump_label_entry((jump_label_t)key);
-	if (entry) {
-		count = entry->nr_entries;
-		iter = entry->table;
+	count = entry->nr_entries;
+	iter = entry->table;
+	while (count--) {
+		if (kernel_text_address(iter->code))
+			arch_jump_label_transform(iter, type);
+		iter++;
+	}
+	/* enable/disable jump labels in modules */
+	hlist_for_each_entry(e_module, module_node, &(entry->modules),
+						hlist) {
+		count = e_module->nr_entries;
+		iter = e_module->table;
 		while (count--) {
-			if (kernel_text_address(iter->code))
+			if (iter->key && kernel_text_address(iter->code))
 				arch_jump_label_transform(iter, type);
 			iter++;
 		}
-		/* eanble/disable jump labels in modules */
-		hlist_for_each_entry(e_module, module_node, &(entry->modules),
-							hlist) {
-			count = e_module->nr_entries;
-			iter = e_module->table;
-			while (count--) {
-				if (iter->key &&
-						kernel_text_address(iter->code))
-					arch_jump_label_transform(iter, type);
-				iter++;
-			}
-		}
 	}
+}
+
+static struct jump_label_entry *get_jump_label_entry_key(struct jump_label_key *key)
+{
+	struct jump_label_entry *entry;
+
+	entry = (struct jump_label_entry *)key->ptr;
+	if (!entry) {
+		entry = add_jump_label_entry((jump_label_t)(unsigned long)key, 0, NULL);
+		if (IS_ERR(entry))
+			return NULL;
+	}
+	return entry;
+}
+
+int jump_label_enabled(struct jump_label_key *key)
+{
+	struct jump_label_entry *entry;
+	int enabled = 0;
+
+	jump_label_lock();
+	entry = get_jump_label_entry_key(key);
+	if (!entry)
+		goto out;
+	enabled = !!entry->refcount;
+out:
+	jump_label_unlock();
+	return enabled;
+}
+
+
+void jump_label_enable(struct jump_label_key *key)
+{
+	struct jump_label_entry *entry;
+
+	jump_label_lock();
+	entry = get_jump_label_entry_key(key);
+	if (!entry)
+		goto out;
+	if (!entry->refcount) {
+		jump_label_update(entry, JUMP_LABEL_ENABLE);
+		entry->refcount = 1;
+	}
+out:
+	jump_label_unlock();
+}
+
+void jump_label_disable(struct jump_label_key *key)
+{
+	struct jump_label_entry *entry;
+
+	jump_label_lock();
+	entry = get_jump_label_entry_key(key);
+	if (!entry)
+		goto out;
+	if (entry->refcount) {
+		jump_label_update(entry, JUMP_LABEL_DISABLE);
+		entry->refcount = 0;
+	}
+out:
 	jump_label_unlock();
 }
 
@@ -305,6 +363,7 @@ add_jump_label_module_entry(struct jump_label_entry *entry,
 			    int count, struct module *mod)
 {
 	struct jump_label_module_entry *e;
+	struct jump_entry *iter;
 
 	e = kmalloc(sizeof(struct jump_label_module_entry), GFP_KERNEL);
 	if (!e)
@@ -313,6 +372,13 @@ add_jump_label_module_entry(struct jump_label_entry *entry,
 	e->nr_entries = count;
 	e->table = iter_begin;
 	hlist_add_head(&e->hlist, &entry->modules);
+	if (entry->refcount) {
+		iter = iter_begin;
+		while (count--) {
+			arch_jump_label_transform(iter, JUMP_LABEL_ENABLE);
+			iter++;
+		}
+	}
 	return e;
 }
 
@@ -360,10 +426,6 @@ static void remove_jump_label_module(struct module *mod)
 	struct jump_label_module_entry *e_module;
 	int i;
 
-	/* if the module doesn't have jump label entries, just return */
-	if (!mod->num_jump_entries)
-		return;
-
 	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
 		head = &jump_label_table[i];
 		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
@@ -375,10 +437,21 @@ static void remove_jump_label_module(struct module *mod)
 					kfree(e_module);
 				}
 			}
+		}
+	}
+	/* now check if any keys can be removed */
+	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
+		head = &jump_label_table[i];
+		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
+			if (!within_module_core(e->key, mod))
+				continue;
 			if (hlist_empty(&e->modules) && (e->nr_entries == 0)) {
 				hlist_del(&e->hlist);
 				kfree(e);
+				continue;
 			}
+			WARN(1, KERN_ERR "jump label: "
+				"tyring to remove used key: %lu !\n", e->key);
 		}
 	}
 }
@@ -470,7 +543,7 @@ void jump_label_apply_nops(struct module *mod)
 
 struct notifier_block jump_label_module_nb = {
 	.notifier_call = jump_label_module_notify,
-	.priority = 0,
+	.priority = 1, /* higher than tracepoints */
 };
 
 static __init int init_jump_label_module(void)
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 11847bf..f96d615 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -38,7 +38,7 @@
 
 #include <asm/irq_regs.h>
 
-atomic_t perf_task_events __read_mostly;
+struct jump_label_key_counter perf_task_events __read_mostly;
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
@@ -2292,8 +2292,10 @@ static void free_event(struct perf_event *event)
 	irq_work_sync(&event->pending);
 
 	if (!event->parent) {
-		if (event->attach_state & PERF_ATTACH_TASK)
-			jump_label_dec(&perf_task_events);
+		if (event->attach_state & PERF_ATTACH_TASK) {
+			if (atomic_dec_and_test(&perf_task_events.ref))
+				jump_label_disable(&perf_task_events.key);
+		}
 		if (event->attr.mmap || event->attr.mmap_data)
 			atomic_dec(&nr_mmap_events);
 		if (event->attr.comm)
@@ -4821,7 +4823,7 @@ fail:
 	return err;
 }
 
-atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
+struct jump_label_key_counter perf_swevent_enabled[PERF_COUNT_SW_MAX];
 
 static void sw_perf_event_destroy(struct perf_event *event)
 {
@@ -4829,7 +4831,8 @@ static void sw_perf_event_destroy(struct perf_event *event)
 
 	WARN_ON(event->parent);
 
-	jump_label_dec(&perf_swevent_enabled[event_id]);
+	if (atomic_dec_and_test(&perf_swevent_enabled[event_id].ref))
+		jump_label_disable(&perf_swevent_enabled[event_id].key);
 	swevent_hlist_put(event);
 }
 
@@ -4854,12 +4857,15 @@ static int perf_swevent_init(struct perf_event *event)
 
 	if (!event->parent) {
 		int err;
+		atomic_t *ref;
 
 		err = swevent_hlist_get(event);
 		if (err)
 			return err;
 
-		jump_label_inc(&perf_swevent_enabled[event_id]);
+		ref = &perf_swevent_enabled[event_id].ref;
+		if (atomic_add_return(1, ref) == 1)
+			jump_label_enable(&perf_swevent_enabled[event_id].key);
 		event->destroy = sw_perf_event_destroy;
 	}
 
@@ -5614,8 +5620,10 @@ done:
 	event->pmu = pmu;
 
 	if (!event->parent) {
-		if (event->attach_state & PERF_ATTACH_TASK)
-			jump_label_inc(&perf_task_events);
+		if (event->attach_state & PERF_ATTACH_TASK) {
+			if (atomic_add_return(1, &perf_task_events.ref) == 1)
+				jump_label_enable(&perf_task_events.key);
+		}
 		if (event->attr.mmap || event->attr.mmap_data)
 			atomic_inc(&nr_mmap_events);
 		if (event->attr.comm)
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index e95ee7f..d54b434 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -251,9 +251,9 @@ static void set_tracepoint(struct tracepoint_entry **entry,
 {
 	WARN_ON(strcmp((*entry)->name, elem->name) != 0);
 
-	if (elem->regfunc && !elem->state && active)
+	if (elem->regfunc && !jump_label_enabled(&elem->key) && active)
 		elem->regfunc();
-	else if (elem->unregfunc && elem->state && !active)
+	else if (elem->unregfunc && jump_label_enabled(&elem->key) && !active)
 		elem->unregfunc();
 
 	/*
@@ -264,13 +264,10 @@ static void set_tracepoint(struct tracepoint_entry **entry,
 	 * is used.
 	 */
 	rcu_assign_pointer(elem->funcs, (*entry)->funcs);
-	if (!elem->state && active) {
-		jump_label_enable(&elem->state);
-		elem->state = active;
-	} else if (elem->state && !active) {
-		jump_label_disable(&elem->state);
-		elem->state = active;
-	}
+	if (active)
+		jump_label_enable(&elem->key);
+	else if (!active)
+		jump_label_disable(&elem->key);
 }
 
 /*
@@ -281,13 +278,10 @@ static void set_tracepoint(struct tracepoint_entry **entry,
  */
 static void disable_tracepoint(struct tracepoint *elem)
 {
-	if (elem->unregfunc && elem->state)
+	if (elem->unregfunc && jump_label_enabled(&elem->key))
 		elem->unregfunc();
 
-	if (elem->state) {
-		jump_label_disable(&elem->state);
-		elem->state = 0;
-	}
+	jump_label_disable(&elem->key);
 	rcu_assign_pointer(elem->funcs, NULL);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 15:43 [PATCH 0/2] jump label: 2.6.38 updates Jason Baron
  2011-01-05 15:43 ` [PATCH 1/2] jump label: make enable/disable o(1) Jason Baron
@ 2011-01-05 15:43 ` Jason Baron
  2011-01-05 17:15   ` Frederic Weisbecker
                     ` (3 more replies)
  2011-02-11 19:25 ` [PATCH 0/2] jump label: 2.6.38 updates Peter Zijlstra
  2 siblings, 4 replies; 113+ messages in thread
From: Jason Baron @ 2011-01-05 15:43 UTC (permalink / raw)
  To: peterz, mathieu.desnoyers, hpa, rostedt, mingo
  Cc: tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi,
	davem, sam, ddaney, michael, linux-kernel

Introduce:

static __always_inline bool static_branch(struct jump_label_key *key)

to replace the old JUMP_LABEL(key, label) macro.

The new static_branch(), simplifies the usage of jump labels. Since,
static_branch() returns a boolean, it can be used as part of an if()
construct. It also, allows us to drop the 'label' argument from the
prototype. Its probably best understood with an example, here is the part
of the patch that converts the tracepoints to use unlikely_switch():

--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -146,9 +146,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
 	extern struct tracepoint __tracepoint_##name;			\
 	static inline void trace_##name(proto)				\
 	{								\
-		JUMP_LABEL(&__tracepoint_##name.key, do_trace);		\
-		return;							\
-do_trace:								\
+		if (static_branch(&__tracepoint_##name.key))		\
 			__DO_TRACE(&__tracepoint_##name,		\
 				TP_PROTO(data_proto),			\
 				TP_ARGS(data_args));			\


I analyzed the code produced by static_branch(), and it seems to be
at least as good as the code generated by the JUMP_LABEL(). As a reminder,
we get a single nop in the fastpath for -02. But will often times get
a 'double jmp' in the -Os case. That is, 'jmp 0', followed by a jmp around
the disabled code. We believe that future gcc tweaks to allow block
re-ordering in the -Os, will solve the -Os case in the future.

I also saw a 1-2% tbench throughput improvement when compiling with
jump labels.

This patch also addresses a build issue that Tetsuo Handa reported where
gcc v3.3 currently chokes on compiling 'dynamic debug':

include/net/inet_connection_sock.h: In function `inet_csk_reset_xmit_timer':
include/net/inet_connection_sock.h:236: error: duplicate label declaration `do_printk'
include/net/inet_connection_sock.h:219: error: this is a previous declaration
include/net/inet_connection_sock.h:236: error: duplicate label declaration `out'
include/net/inet_connection_sock.h:219: error: this is a previous declaration
include/net/inet_connection_sock.h:236: error: duplicate label `do_printk'
include/net/inet_connection_sock.h:236: error: duplicate label `out'


Thanks to H. Peter Anvin for suggesting this improved syntax.

Suggested-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
---
 arch/sparc/include/asm/jump_label.h |   25 ++++++++++++++-----------
 arch/x86/include/asm/jump_label.h   |   22 +++++++++++++---------
 arch/x86/kernel/jump_label.c        |    2 +-
 include/linux/dynamic_debug.h       |   18 ++++--------------
 include/linux/jump_label.h          |   26 +++++++++++---------------
 include/linux/jump_label_ref.h      |   18 +++++++++++-------
 include/linux/perf_event.h          |   26 +++++++++++++-------------
 include/linux/tracepoint.h          |    4 +---
 kernel/jump_label.c                 |    2 +-
 9 files changed, 69 insertions(+), 74 deletions(-)

diff --git a/arch/sparc/include/asm/jump_label.h b/arch/sparc/include/asm/jump_label.h
index 427d468..882651c 100644
--- a/arch/sparc/include/asm/jump_label.h
+++ b/arch/sparc/include/asm/jump_label.h
@@ -7,17 +7,20 @@
 
 #define JUMP_LABEL_NOP_SIZE 4
 
-#define JUMP_LABEL(key, label)					\
-	do {							\
-		asm goto("1:\n\t"				\
-			 "nop\n\t"				\
-			 "nop\n\t"				\
-			 ".pushsection __jump_table,  \"a\"\n\t"\
-			 ".align 4\n\t"				\
-			 ".word 1b, %l[" #label "], %c0\n\t"	\
-			 ".popsection \n\t"			\
-			 : :  "i" (key) :  : label);\
-	} while (0)
+static __always_inline bool __static_branch(struct jump_label_key *key)
+{
+		asm goto("1:\n\t"				
+			 "nop\n\t"				
+			 "nop\n\t"				
+			 ".pushsection __jump_table,  \"a\"\n\t"
+			 ".align 4\n\t"				
+			 ".word 1b, %l[l_yes], %c0\n\t"	
+			 ".popsection \n\t"			
+			 : :  "i" (key) :  : l_yes);
+	return false;
+l_yes:
+	return true;
+}
 
 #endif /* __KERNEL__ */
 
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index f52d42e..3d44a7c 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -5,20 +5,24 @@
 
 #include <linux/types.h>
 #include <asm/nops.h>
+#include <asm/asm.h>
 
 #define JUMP_LABEL_NOP_SIZE 5
 
 # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
 
-# define JUMP_LABEL(key, label)					\
-	do {							\
-		asm goto("1:"					\
-			JUMP_LABEL_INITIAL_NOP			\
-			".pushsection __jump_table,  \"a\" \n\t"\
-			_ASM_PTR "1b, %l[" #label "], %c0 \n\t" \
-			".popsection \n\t"			\
-			: :  "i" (key) :  : label);		\
-	} while (0)
+static __always_inline bool __static_branch(struct jump_label_key *key)
+{
+	asm goto("1:"
+		JUMP_LABEL_INITIAL_NOP
+		".pushsection __jump_table,  \"a\" \n\t"
+		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
+		".popsection \n\t"
+		: :  "i" (key) : : l_yes );
+	return false;
+l_yes:
+	return true;
+}
 
 #endif /* __KERNEL__ */
 
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 961b6b3..dfa4c3c 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -4,13 +4,13 @@
  * Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
  *
  */
-#include <linux/jump_label.h>
 #include <linux/memory.h>
 #include <linux/uaccess.h>
 #include <linux/module.h>
 #include <linux/list.h>
 #include <linux/jhash.h>
 #include <linux/cpu.h>
+#include <linux/jump_label.h>
 #include <asm/kprobes.h>
 #include <asm/alternative.h>
 
diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index ddf7bae..2ade291 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -44,34 +44,24 @@ int ddebug_add_module(struct _ddebug *tab, unsigned int n,
 extern int ddebug_remove_module(const char *mod_name);
 
 #define dynamic_pr_debug(fmt, ...) do {					\
-	__label__ do_printk;						\
-	__label__ out;							\
 	static struct _ddebug descriptor				\
 	__used								\
 	__attribute__((section("__verbose"), aligned(8))) =		\
 	{ KBUILD_MODNAME, __func__, __FILE__, fmt, __LINE__,		\
 		_DPRINTK_FLAGS_DEFAULT, JUMP_LABEL_INIT };		\
-	JUMP_LABEL(&descriptor.enabled, do_printk);			\
-	goto out;							\
-do_printk:								\
-	printk(KERN_DEBUG pr_fmt(fmt),	##__VA_ARGS__);			\
-out:	;								\
+	if (static_branch(&descriptor.enabled))				\
+		printk(KERN_DEBUG pr_fmt(fmt),	##__VA_ARGS__);		\
 	} while (0)
 
 
 #define dynamic_dev_dbg(dev, fmt, ...) do {				\
-	__label__ do_printk;						\
-	__label__ out;							\
 	static struct _ddebug descriptor				\
 	__used								\
 	__attribute__((section("__verbose"), aligned(8))) =		\
 	{ KBUILD_MODNAME, __func__, __FILE__, fmt, __LINE__,		\
 		_DPRINTK_FLAGS_DEFAULT, JUMP_LABEL_INIT };		\
-	JUMP_LABEL(&descriptor.enabled, do_printk);			\
-	goto out;							\
-do_printk:								\
-	dev_printk(KERN_DEBUG, dev, fmt, ##__VA_ARGS__);		\
-out:	;								\
+	if (static_branch(&descriptor.enabled))				\
+		dev_printk(KERN_DEBUG, dev, fmt, ##__VA_ARGS__);	\
 	} while (0)
 
 #else
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 152f7de..0ad9c2e 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -22,6 +22,11 @@ struct module;
 
 #ifdef HAVE_JUMP_LABEL
 
+static __always_inline bool static_branch(struct jump_label_key *key)
+{
+	return __static_branch(key);
+}
+
 extern struct jump_entry __start___jump_table[];
 extern struct jump_entry __stop___jump_table[];
 
@@ -42,11 +47,12 @@ struct jump_label_key {
 	int state;
 };
 
-#define JUMP_LABEL(key, label)			\
-do {						\
-	if (unlikely(((struct jump_label_key *)key)->state))		\
-		goto label;			\
-} while (0)
+static __always_inline bool static_branch(struct jump_label_key *key)
+{
+	if (unlikely(key->state))
+		return true;
+	return false;
+}
 
 static inline int jump_label_enabled(struct jump_label_key *key)
 {
@@ -78,14 +84,4 @@ static inline void jump_label_unlock(void) {}
 
 #endif
 
-#define COND_STMT(key, stmt)					\
-do {								\
-	__label__ jl_enabled;					\
-	JUMP_LABEL_ELSE_ATOMIC_READ(key, jl_enabled);		\
-	if (0) {						\
-jl_enabled:							\
-		stmt;						\
-	}							\
-} while (0)
-
 #endif
diff --git a/include/linux/jump_label_ref.h b/include/linux/jump_label_ref.h
index 8a76e89..5178696 100644
--- a/include/linux/jump_label_ref.h
+++ b/include/linux/jump_label_ref.h
@@ -7,19 +7,23 @@
 struct jump_label_key_counter {
 	atomic_t ref;
 	struct jump_label_key key;
-}
+};
 
 #ifdef HAVE_JUMP_LABEL
 
-#define JUMP_LABEL_ELSE_ATOMIC_READ(key, label, counter) JUMP_LABEL(key, label)
+static __always_inline bool static_branch_else_atomic_read(struct jump_label_key *key, atomic_t *count)
+{
+	return __static_branch(key);
+}
 
 #else /* !HAVE_JUMP_LABEL */
 
-#define JUMP_LABEL_ELSE_ATOMIC_READ(key, label, counter)		\
-do {									\
-	if (unlikely(atomic_read((atomic_t *)counter)))			\
-		goto label;						\
-} while (0)
+static __always_inline bool static_branch_else_atomic_read(struct jump_label_key *key, atomic_t *count)
+{
+	if (unlikely(atomic_read(count)))
+		return true;
+	return false;
+}
 
 #endif /* HAVE_JUMP_LABEL */
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 94834ce..26fe115 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1029,32 +1029,32 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
 {
 	struct pt_regs hot_regs;
 
-	JUMP_LABEL_ELSE_ATOMIC_READ(&perf_swevent_enabled[event_id].key,
-				    have_event,
-				    &perf_swevent_enabled[event_id].ref);
-	return;
-
-have_event:
-	if (!regs) {
-		perf_fetch_caller_regs(&hot_regs);
-		regs = &hot_regs;
+	if (static_branch_else_atomic_read(&perf_swevent_enabled[event_id].key,
+					 &perf_swevent_enabled[event_id].ref)) {
+		if (!regs) {
+			perf_fetch_caller_regs(&hot_regs);
+			regs = &hot_regs;
+		}
+		__perf_sw_event(event_id, nr, nmi, regs, addr);
 	}
-	__perf_sw_event(event_id, nr, nmi, regs, addr);
 }
 
 extern struct jump_label_key_counter perf_task_events;
 
 static inline void perf_event_task_sched_in(struct task_struct *task)
 {
-	COND_STMT(&perf_task_events, __perf_event_task_sched_in(task));
+	if (static_branch_else_atomic_read(&perf_task_events.key,
+					   &perf_task_events.ref))
+		__perf_event_task_sched_in(task);
 }
 
 static inline
 void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
 {
 	perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
-
-	COND_STMT(&perf_task_events, __perf_event_task_sched_out(task, next));
+	if (static_branch_else_atomic_read(&perf_task_events.key,
+					   &perf_task_events.ref))
+		__perf_event_task_sched_out(task, next);
 }
 
 extern void perf_event_mmap(struct vm_area_struct *vma);
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 2ff00e5..b95e99a 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -149,9 +149,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
 	extern struct tracepoint __tracepoint_##name;			\
 	static inline void trace_##name(proto)				\
 	{								\
-		JUMP_LABEL(&__tracepoint_##name.key, do_trace);		\
-		return;							\
-do_trace:								\
+		if (static_branch(&__tracepoint_##name.key))		\
 			__DO_TRACE(&__tracepoint_##name,		\
 				TP_PROTO(data_proto),			\
 				TP_ARGS(data_args),			\
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index b6d461c..b72d3cd 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -4,7 +4,6 @@
  * Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
  *
  */
-#include <linux/jump_label.h>
 #include <linux/memory.h>
 #include <linux/uaccess.h>
 #include <linux/module.h>
@@ -13,6 +12,7 @@
 #include <linux/slab.h>
 #include <linux/sort.h>
 #include <linux/err.h>
+#include <linux/jump_label.h>
 
 #ifdef HAVE_JUMP_LABEL
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 15:43 ` [PATCH 2/2] jump label: introduce static_branch() Jason Baron
@ 2011-01-05 17:15   ` Frederic Weisbecker
  2011-01-05 17:46     ` Steven Rostedt
  2011-01-05 21:14     ` Jason Baron
  2011-01-05 17:32   ` David Daney
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 113+ messages in thread
From: Frederic Weisbecker @ 2011-01-05 17:15 UTC (permalink / raw)
  To: Jason Baron
  Cc: peterz, mathieu.desnoyers, hpa, rostedt, mingo, tglx, andi,
	roland, rth, masami.hiramatsu.pt, avi, davem, sam, ddaney,
	michael, linux-kernel

On Wed, Jan 05, 2011 at 10:43:12AM -0500, Jason Baron wrote:
> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> index 152f7de..0ad9c2e 100644
> --- a/include/linux/jump_label.h
> +++ b/include/linux/jump_label.h
> @@ -22,6 +22,11 @@ struct module;
>  
>  #ifdef HAVE_JUMP_LABEL
>  
> +static __always_inline bool static_branch(struct jump_label_key *key)
> +{
> +	return __static_branch(key);

Not very important, but __static_branch() would be more self-explained
if it was called arch_static_branch().

> +}
> +
>  extern struct jump_entry __start___jump_table[];
>  extern struct jump_entry __stop___jump_table[];
>  
> @@ -42,11 +47,12 @@ struct jump_label_key {
>  	int state;
>  };
>  
> -#define JUMP_LABEL(key, label)			\
> -do {						\
> -	if (unlikely(((struct jump_label_key *)key)->state))		\
> -		goto label;			\
> -} while (0)
> +static __always_inline bool static_branch(struct jump_label_key *key)
> +{
> +	if (unlikely(key->state))
> +		return true;
> +	return false;
> +}
>  
>  static inline int jump_label_enabled(struct jump_label_key *key)
>  {
> @@ -78,14 +84,4 @@ static inline void jump_label_unlock(void) {}
>  
>  #endif
>  
> -#define COND_STMT(key, stmt)					\
> -do {								\
> -	__label__ jl_enabled;					\
> -	JUMP_LABEL_ELSE_ATOMIC_READ(key, jl_enabled);		\
> -	if (0) {						\
> -jl_enabled:							\
> -		stmt;						\
> -	}							\
> -} while (0)
> -
>  #endif
> diff --git a/include/linux/jump_label_ref.h b/include/linux/jump_label_ref.h
> index 8a76e89..5178696 100644
> --- a/include/linux/jump_label_ref.h
> +++ b/include/linux/jump_label_ref.h
> @@ -7,19 +7,23 @@
>  struct jump_label_key_counter {
>  	atomic_t ref;
>  	struct jump_label_key key;
> -}
> +};
>  
>  #ifdef HAVE_JUMP_LABEL
>  
> -#define JUMP_LABEL_ELSE_ATOMIC_READ(key, label, counter) JUMP_LABEL(key, label)
> +static __always_inline bool static_branch_else_atomic_read(struct jump_label_key *key, atomic_t *count)
> +{
> +	return __static_branch(key);
> +}

How about having only static_branch() but the key would be handled only
by ways of get()/put().

Simple boolean key enablement would work in this scheme as well as branches
based on refcount. So that the users could avoid maintaining both key and count,
this would be transparently handled by the jump label API.

Or am I missing something?

Other than that, looks like a very nice patch!

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/2] jump label: make enable/disable o(1)
  2011-01-05 15:43 ` [PATCH 1/2] jump label: make enable/disable o(1) Jason Baron
@ 2011-01-05 17:31   ` Steven Rostedt
  2011-01-05 21:19     ` Jason Baron
  0 siblings, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2011-01-05 17:31 UTC (permalink / raw)
  To: Jason Baron
  Cc: peterz, mathieu.desnoyers, hpa, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Wed, 2011-01-05 at 10:43 -0500, Jason Baron wrote:
>  
> +struct jump_label_key {
> +	int state;
> +};
> +
>  #define JUMP_LABEL(key, label)			\
>  do {						\
> -	if (unlikely(*key))			\
> +	if (unlikely(((struct jump_label_key *)key)->state))		\
>  		goto label;			\
>  } while (0)

Anything that uses JUMP_LABEL() should pass in a pointer to a struct
jump_label_key. Hence, remove the typecast. That can only lead to hard
to find bugs.

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 15:43 ` [PATCH 2/2] jump label: introduce static_branch() Jason Baron
  2011-01-05 17:15   ` Frederic Weisbecker
@ 2011-01-05 17:32   ` David Daney
  2011-01-05 17:43     ` Steven Rostedt
  2011-01-05 21:16     ` Jason Baron
  2011-01-05 17:41   ` Steven Rostedt
  2011-01-09 18:48   ` Mathieu Desnoyers
  3 siblings, 2 replies; 113+ messages in thread
From: David Daney @ 2011-01-05 17:32 UTC (permalink / raw)
  To: Jason Baron, Ralf Baechle
  Cc: peterz, mathieu.desnoyers, hpa, rostedt, mingo, tglx, andi,
	roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem, sam,
	michael, linux-kernel

On 01/05/2011 07:43 AM, Jason Baron wrote:
> Introduce:
>
> static __always_inline bool static_branch(struct jump_label_key *key)
>
> to replace the old JUMP_LABEL(key, label) macro.
>
> The new static_branch(), simplifies the usage of jump labels. Since,
> static_branch() returns a boolean, it can be used as part of an if()
> construct. It also, allows us to drop the 'label' argument from the
> prototype. Its probably best understood with an example, here is the part
> of the patch that converts the tracepoints to use unlikely_switch():
>
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -146,9 +146,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
>   	extern struct tracepoint __tracepoint_##name;			\
>   	static inline void trace_##name(proto)				\
>   	{								\
> -		JUMP_LABEL(&__tracepoint_##name.key, do_trace);		\
> -		return;							\
> -do_trace:								\
> +		if (static_branch(&__tracepoint_##name.key))		\
>   			__DO_TRACE(&__tracepoint_##name,		\
>   				TP_PROTO(data_proto),			\
>   				TP_ARGS(data_args));			\
>
>
> I analyzed the code produced by static_branch(), and it seems to be
> at least as good as the code generated by the JUMP_LABEL(). As a reminder,
> we get a single nop in the fastpath for -02. But will often times get
> a 'double jmp' in the -Os case. That is, 'jmp 0', followed by a jmp around
> the disabled code. We believe that future gcc tweaks to allow block
> re-ordering in the -Os, will solve the -Os case in the future.
>
> I also saw a 1-2% tbench throughput improvement when compiling with
> jump labels.
>
> This patch also addresses a build issue that Tetsuo Handa reported where
> gcc v3.3 currently chokes on compiling 'dynamic debug':
>
> include/net/inet_connection_sock.h: In function `inet_csk_reset_xmit_timer':
> include/net/inet_connection_sock.h:236: error: duplicate label declaration `do_printk'
> include/net/inet_connection_sock.h:219: error: this is a previous declaration
> include/net/inet_connection_sock.h:236: error: duplicate label declaration `out'
> include/net/inet_connection_sock.h:219: error: this is a previous declaration
> include/net/inet_connection_sock.h:236: error: duplicate label `do_printk'
> include/net/inet_connection_sock.h:236: error: duplicate label `out'
>
>
> Thanks to H. Peter Anvin for suggesting this improved syntax.
>
> Suggested-by: H. Peter Anvin<hpa@linux.intel.com>
> Signed-off-by: Jason Baron<jbaron@redhat.com>
> Tested-by: Tetsuo Handa<penguin-kernel@i-love.sakura.ne.jp>
> ---
>   arch/sparc/include/asm/jump_label.h |   25 ++++++++++++++-----------
>   arch/x86/include/asm/jump_label.h   |   22 +++++++++++++---------
>   arch/x86/kernel/jump_label.c        |    2 +-
>   include/linux/dynamic_debug.h       |   18 ++++--------------
>   include/linux/jump_label.h          |   26 +++++++++++---------------
>   include/linux/jump_label_ref.h      |   18 +++++++++++-------
>   include/linux/perf_event.h          |   26 +++++++++++++-------------
>   include/linux/tracepoint.h          |    4 +---
>   kernel/jump_label.c                 |    2 +-
>   9 files changed, 69 insertions(+), 74 deletions(-)
>
[...]

This patch will conflict with the MIPS jump label support that Ralf has 
queued up for 2.6.38.

David Daney

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 15:43 ` [PATCH 2/2] jump label: introduce static_branch() Jason Baron
  2011-01-05 17:15   ` Frederic Weisbecker
  2011-01-05 17:32   ` David Daney
@ 2011-01-05 17:41   ` Steven Rostedt
  2011-01-09 18:48   ` Mathieu Desnoyers
  3 siblings, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-01-05 17:41 UTC (permalink / raw)
  To: Jason Baron
  Cc: peterz, mathieu.desnoyers, hpa, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Wed, 2011-01-05 at 10:43 -0500, Jason Baron wrote:
> Introduce:
> 
> static __always_inline bool static_branch(struct jump_label_key *key)
> 
> to replace the old JUMP_LABEL(key, label) macro.
> 
> The new static_branch(), simplifies the usage of jump labels. Since,
> static_branch() returns a boolean, it can be used as part of an if()
> construct. It also, allows us to drop the 'label' argument from the
> prototype. Its probably best understood with an example, here is the part
> of the patch that converts the tracepoints to use unlikely_switch():
> 
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -146,9 +146,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
>  	extern struct tracepoint __tracepoint_##name;			\
>  	static inline void trace_##name(proto)				\
>  	{								\
> -		JUMP_LABEL(&__tracepoint_##name.key, do_trace);		\
> -		return;							\
> -do_trace:								\
> +		if (static_branch(&__tracepoint_##name.key))		\
>  			__DO_TRACE(&__tracepoint_##name,		\
>  				TP_PROTO(data_proto),			\
>  				TP_ARGS(data_args));			\

BTW, do not put real diffs in the change log. That is, remove the header
from it. This can confuse tools that pull in patches from mailing lists.
As this change will be done in the code itself.

Thanks,

-- Steve

> 
> 
> I analyzed the code produced by static_branch(), and it seems to be
> at least as good as the code generated by the JUMP_LABEL(). As a reminder,
> we get a single nop in the fastpath for -02. But will often times get
> a 'double jmp' in the -Os case. That is, 'jmp 0', followed by a jmp around
> the disabled code. We believe that future gcc tweaks to allow block
> re-ordering in the -Os, will solve the -Os case in the future.
> 
> I also saw a 1-2% tbench throughput improvement when compiling with
> jump labels.
> 
> This patch also addresses a build issue that Tetsuo Handa reported where
> gcc v3.3 currently chokes on compiling 'dynamic debug':



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 17:32   ` David Daney
@ 2011-01-05 17:43     ` Steven Rostedt
  2011-01-05 18:44       ` David Miller
  2011-01-05 18:56       ` H. Peter Anvin
  2011-01-05 21:16     ` Jason Baron
  1 sibling, 2 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-01-05 17:43 UTC (permalink / raw)
  To: David Daney
  Cc: Jason Baron, Ralf Baechle, peterz, mathieu.desnoyers, hpa, mingo,
	tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi,
	davem, sam, michael, linux-kernel

On Wed, 2011-01-05 at 09:32 -0800, David Daney wrote:

> This patch will conflict with the MIPS jump label support that Ralf has 
> queued up for 2.6.38.

Can you disable that support for now? As Linus said at Kernel Summit,
other archs jumped too quickly onto the jump label band wagon. This
change really needs to get in, and IMO, it is more critical to clean up
the jump label code than to have other archs implementing it.

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 17:15   ` Frederic Weisbecker
@ 2011-01-05 17:46     ` Steven Rostedt
  2011-01-05 18:52       ` H. Peter Anvin
  2011-01-05 21:14     ` Jason Baron
  1 sibling, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2011-01-05 17:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jason Baron, peterz, mathieu.desnoyers, hpa, mingo, tglx, andi,
	roland, rth, masami.hiramatsu.pt, avi, davem, sam, ddaney,
	michael, linux-kernel

On Wed, 2011-01-05 at 18:15 +0100, Frederic Weisbecker wrote:
> On Wed, Jan 05, 2011 at 10:43:12AM -0500, Jason Baron wrote:
> > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> > index 152f7de..0ad9c2e 100644
> > --- a/include/linux/jump_label.h
> > +++ b/include/linux/jump_label.h
> > @@ -22,6 +22,11 @@ struct module;
> >  
> >  #ifdef HAVE_JUMP_LABEL
> >  
> > +static __always_inline bool static_branch(struct jump_label_key *key)
> > +{
> > +	return __static_branch(key);
> 
> Not very important, but __static_branch() would be more self-explained
> if it was called arch_static_branch().

I disagree, I think it is very important ;-)

Yes, the kernel has been moving to adding "arch_" to functions that are
implemented dependently by different archs. Please change this to
"arch_static_branch()".

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 17:43     ` Steven Rostedt
@ 2011-01-05 18:44       ` David Miller
  2011-01-05 20:04         ` Steven Rostedt
  2011-01-05 18:56       ` H. Peter Anvin
  1 sibling, 1 reply; 113+ messages in thread
From: David Miller @ 2011-01-05 18:44 UTC (permalink / raw)
  To: rostedt
  Cc: ddaney, jbaron, ralf, peterz, mathieu.desnoyers, hpa, mingo,
	tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	michael, linux-kernel

From: Steven Rostedt <rostedt@goodmis.org>
Date: Wed, 05 Jan 2011 12:43:59 -0500

> On Wed, 2011-01-05 at 09:32 -0800, David Daney wrote:
> 
>> This patch will conflict with the MIPS jump label support that Ralf has 
>> queued up for 2.6.38.
> 
> Can you disable that support for now? As Linus said at Kernel Summit,
> other archs jumped too quickly onto the jump label band wagon.

I totally disagree with this assesment.  Implementing jump label for
sparc64 as early as possible found so much broken stuff that otherwise
would have merged in before any other architecture tried supporting
it.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 17:46     ` Steven Rostedt
@ 2011-01-05 18:52       ` H. Peter Anvin
  2011-01-05 21:19         ` Jason Baron
  0 siblings, 1 reply; 113+ messages in thread
From: H. Peter Anvin @ 2011-01-05 18:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Jason Baron, peterz, mathieu.desnoyers,
	mingo, tglx, andi, roland, rth, masami.hiramatsu.pt, avi, davem,
	sam, ddaney, michael, linux-kernel

On 01/05/2011 09:46 AM, Steven Rostedt wrote:
> On Wed, 2011-01-05 at 18:15 +0100, Frederic Weisbecker wrote:
>> On Wed, Jan 05, 2011 at 10:43:12AM -0500, Jason Baron wrote:
>>> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
>>> index 152f7de..0ad9c2e 100644
>>> --- a/include/linux/jump_label.h
>>> +++ b/include/linux/jump_label.h
>>> @@ -22,6 +22,11 @@ struct module;
>>>  
>>>  #ifdef HAVE_JUMP_LABEL
>>>  
>>> +static __always_inline bool static_branch(struct jump_label_key *key)
>>> +{
>>> +	return __static_branch(key);
>>
>> Not very important, but __static_branch() would be more self-explained
>> if it was called arch_static_branch().
> 
> I disagree, I think it is very important ;-)
> 
> Yes, the kernel has been moving to adding "arch_" to functions that are
> implemented dependently by different archs. Please change this to
> "arch_static_branch()".
> 

Indeed.  This hugely simplifies knowing where to look and whose
responsibility it is.

	-hpa

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 17:43     ` Steven Rostedt
  2011-01-05 18:44       ` David Miller
@ 2011-01-05 18:56       ` H. Peter Anvin
  2011-01-05 19:14         ` Ingo Molnar
  1 sibling, 1 reply; 113+ messages in thread
From: H. Peter Anvin @ 2011-01-05 18:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Daney, Jason Baron, Ralf Baechle, peterz,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, michael,
	linux-kernel

On 01/05/2011 09:43 AM, Steven Rostedt wrote:
> On Wed, 2011-01-05 at 09:32 -0800, David Daney wrote:
> 
>> This patch will conflict with the MIPS jump label support that Ralf has 
>> queued up for 2.6.38.
> 
> Can you disable that support for now? As Linus said at Kernel Summit,
> other archs jumped too quickly onto the jump label band wagon. This
> change really needs to get in, and IMO, it is more critical to clean up
> the jump label code than to have other archs implementing it.
> 

Ralf is really good... perhaps we can get the conflicts resolved?

	-hpa

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 18:56       ` H. Peter Anvin
@ 2011-01-05 19:14         ` Ingo Molnar
  2011-01-05 19:32           ` David Daney
  0 siblings, 1 reply; 113+ messages in thread
From: Ingo Molnar @ 2011-01-05 19:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Rostedt, David Daney, Jason Baron, Ralf Baechle, peterz,
	mathieu.desnoyers, tglx, andi, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, davem, sam, michael, linux-kernel


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 01/05/2011 09:43 AM, Steven Rostedt wrote:
> > On Wed, 2011-01-05 at 09:32 -0800, David Daney wrote:
> > 
> >> This patch will conflict with the MIPS jump label support that Ralf has 
> >> queued up for 2.6.38.
> > 
> > Can you disable that support for now? As Linus said at Kernel Summit,
> > other archs jumped too quickly onto the jump label band wagon. This
> > change really needs to get in, and IMO, it is more critical to clean up
> > the jump label code than to have other archs implementing it.
> > 
> 
> Ralf is really good... perhaps we can get the conflicts resolved?

Yep, the best Git-ish way to handle that is to resolve the conflicts whenever they 
happen - i.e. whoever merges his tree upstream later. No need for anyone to 'wait' 
or undo anything.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 19:14         ` Ingo Molnar
@ 2011-01-05 19:32           ` David Daney
  2011-01-05 19:50             ` Ingo Molnar
  0 siblings, 1 reply; 113+ messages in thread
From: David Daney @ 2011-01-05 19:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Steven Rostedt, Jason Baron, Ralf Baechle,
	peterz, mathieu.desnoyers, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, michael,
	linux-kernel

On 01/05/2011 11:14 AM, Ingo Molnar wrote:
>
> * H. Peter Anvin<hpa@zytor.com>  wrote:
>
>> On 01/05/2011 09:43 AM, Steven Rostedt wrote:
>>> On Wed, 2011-01-05 at 09:32 -0800, David Daney wrote:
>>>
>>>> This patch will conflict with the MIPS jump label support that Ralf has
>>>> queued up for 2.6.38.
>>>
>>> Can you disable that support for now? As Linus said at Kernel Summit,
>>> other archs jumped too quickly onto the jump label band wagon. This
>>> change really needs to get in, and IMO, it is more critical to clean up
>>> the jump label code than to have other archs implementing it.
>>>
>>
>> Ralf is really good... perhaps we can get the conflicts resolved?
>
> Yep, the best Git-ish way to handle that is to resolve the conflicts whenever they
> happen - i.e. whoever merges his tree upstream later. No need for anyone to 'wait'
> or undo anything.
>

There will be no git conflicts, as the affected files are disjoint.  It 
will be manifested as a build failure for MIPS, which is why I raised 
the issue.

No matter I guess.  We will undoubtedly have many -rc releases in which 
we can merge any required adjustments.

David Daney

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 19:32           ` David Daney
@ 2011-01-05 19:50             ` Ingo Molnar
  2011-01-05 20:07               ` David Daney
  0 siblings, 1 reply; 113+ messages in thread
From: Ingo Molnar @ 2011-01-05 19:50 UTC (permalink / raw)
  To: David Daney
  Cc: H. Peter Anvin, Steven Rostedt, Jason Baron, Ralf Baechle,
	peterz, mathieu.desnoyers, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, michael,
	linux-kernel


* David Daney <ddaney@caviumnetworks.com> wrote:

> On 01/05/2011 11:14 AM, Ingo Molnar wrote:
> >
> >* H. Peter Anvin<hpa@zytor.com>  wrote:
> >
> >>On 01/05/2011 09:43 AM, Steven Rostedt wrote:
> >>>On Wed, 2011-01-05 at 09:32 -0800, David Daney wrote:
> >>>
> >>>>This patch will conflict with the MIPS jump label support that Ralf has
> >>>>queued up for 2.6.38.
> >>>
> >>>Can you disable that support for now? As Linus said at Kernel Summit,
> >>>other archs jumped too quickly onto the jump label band wagon. This
> >>>change really needs to get in, and IMO, it is more critical to clean up
> >>>the jump label code than to have other archs implementing it.
> >>>
> >>
> >>Ralf is really good... perhaps we can get the conflicts resolved?
> >
> >Yep, the best Git-ish way to handle that is to resolve the conflicts whenever they
> >happen - i.e. whoever merges his tree upstream later. No need for anyone to 'wait'
> >or undo anything.
> >
> 
> There will be no git conflicts, as the affected files are disjoint.

I regularly resolve semantic conflicts in merge commits - or in the first followup 
commit.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 18:44       ` David Miller
@ 2011-01-05 20:04         ` Steven Rostedt
  0 siblings, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-01-05 20:04 UTC (permalink / raw)
  To: David Miller
  Cc: ddaney, jbaron, ralf, peterz, mathieu.desnoyers, hpa, mingo,
	tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	michael, linux-kernel

On Wed, 2011-01-05 at 10:44 -0800, David Miller wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> Date: Wed, 05 Jan 2011 12:43:59 -0500
> 
> > On Wed, 2011-01-05 at 09:32 -0800, David Daney wrote:
> > 
> >> This patch will conflict with the MIPS jump label support that Ralf has 
> >> queued up for 2.6.38.
> > 
> > Can you disable that support for now? As Linus said at Kernel Summit,
> > other archs jumped too quickly onto the jump label band wagon.
> 
> I totally disagree with this assesment.  Implementing jump label for
> sparc64 as early as possible found so much broken stuff that otherwise
> would have merged in before any other architecture tried supporting
> it.

The issue here is that jump labels went in too fast. And I agree that
having it ported to all archs is/was important. But the infrastructure
needs to be cleaned up.

Probably best to get out the kinks in Linux next as suppose to mainline.

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 19:50             ` Ingo Molnar
@ 2011-01-05 20:07               ` David Daney
  2011-01-05 20:08                 ` H. Peter Anvin
  2011-01-05 20:18                 ` Ingo Molnar
  0 siblings, 2 replies; 113+ messages in thread
From: David Daney @ 2011-01-05 20:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Steven Rostedt, Jason Baron, Ralf Baechle,
	peterz, mathieu.desnoyers, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, michael,
	linux-kernel

On 01/05/2011 11:50 AM, Ingo Molnar wrote:
>
> * David Daney<ddaney@caviumnetworks.com>  wrote:
>
>> On 01/05/2011 11:14 AM, Ingo Molnar wrote:
>>>
>>> * H. Peter Anvin<hpa@zytor.com>   wrote:
>>>
>>>> On 01/05/2011 09:43 AM, Steven Rostedt wrote:
>>>>> On Wed, 2011-01-05 at 09:32 -0800, David Daney wrote:
>>>>>
>>>>>> This patch will conflict with the MIPS jump label support that Ralf has
>>>>>> queued up for 2.6.38.
>>>>>
>>>>> Can you disable that support for now? As Linus said at Kernel Summit,
>>>>> other archs jumped too quickly onto the jump label band wagon. This
>>>>> change really needs to get in, and IMO, it is more critical to clean up
>>>>> the jump label code than to have other archs implementing it.
>>>>>
>>>>
>>>> Ralf is really good... perhaps we can get the conflicts resolved?
>>>
>>> Yep, the best Git-ish way to handle that is to resolve the conflicts whenever they
>>> happen - i.e. whoever merges his tree upstream later. No need for anyone to 'wait'
>>> or undo anything.
>>>
>>
>> There will be no git conflicts, as the affected files are disjoint.
>
> I regularly resolve semantic conflicts in merge commits - or in the first followup
> commit.
>

But I am guessing that neither you, nor Linus, regularly build MIPS 
kernels with GCC-4.5.x *and* jump label support enabled.  So how would 
such semantic conflict ever be detected?  I would expect the conflict to 
first occur when Linus pulls Ralf's tree.

I don't expect anybody to magically fix such things, so whatever 
happens, I will test it and submit patches if required.

Thanks,
David Daney

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 20:07               ` David Daney
@ 2011-01-05 20:08                 ` H. Peter Anvin
  2011-01-05 20:18                 ` Ingo Molnar
  1 sibling, 0 replies; 113+ messages in thread
From: H. Peter Anvin @ 2011-01-05 20:08 UTC (permalink / raw)
  To: David Daney
  Cc: Ingo Molnar, Steven Rostedt, Jason Baron, Ralf Baechle, peterz,
	mathieu.desnoyers, tglx, andi, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, davem, sam, michael, linux-kernel

On 01/05/2011 12:07 PM, David Daney wrote:
> 
> But I am guessing that neither you, nor Linus, regularly build MIPS 
> kernels with GCC-4.5.x *and* jump label support enabled.  So how would 
> such semantic conflict ever be detected?  I would expect the conflict to 
> first occur when Linus pulls Ralf's tree.
> 
> I don't expect anybody to magically fix such things, so whatever 
> happens, I will test it and submit patches if required.
> 

If Ralf knows to expect them, then Ralf can take corrective actions as
he thinks is appropriate.

	-hpa

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 20:07               ` David Daney
  2011-01-05 20:08                 ` H. Peter Anvin
@ 2011-01-05 20:18                 ` Ingo Molnar
  1 sibling, 0 replies; 113+ messages in thread
From: Ingo Molnar @ 2011-01-05 20:18 UTC (permalink / raw)
  To: David Daney
  Cc: H. Peter Anvin, Steven Rostedt, Jason Baron, Ralf Baechle,
	peterz, mathieu.desnoyers, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, michael,
	linux-kernel


* David Daney <ddaney@caviumnetworks.com> wrote:

> On 01/05/2011 11:50 AM, Ingo Molnar wrote:
> >
> >* David Daney<ddaney@caviumnetworks.com>  wrote:
> >
> >>On 01/05/2011 11:14 AM, Ingo Molnar wrote:
> >>>
> >>>* H. Peter Anvin<hpa@zytor.com>   wrote:
> >>>
> >>>>On 01/05/2011 09:43 AM, Steven Rostedt wrote:
> >>>>>On Wed, 2011-01-05 at 09:32 -0800, David Daney wrote:
> >>>>>
> >>>>>>This patch will conflict with the MIPS jump label support that Ralf has
> >>>>>>queued up for 2.6.38.
> >>>>>
> >>>>>Can you disable that support for now? As Linus said at Kernel Summit,
> >>>>>other archs jumped too quickly onto the jump label band wagon. This
> >>>>>change really needs to get in, and IMO, it is more critical to clean up
> >>>>>the jump label code than to have other archs implementing it.
> >>>>>
> >>>>
> >>>>Ralf is really good... perhaps we can get the conflicts resolved?
> >>>
> >>>Yep, the best Git-ish way to handle that is to resolve the conflicts whenever they
> >>>happen - i.e. whoever merges his tree upstream later. No need for anyone to 'wait'
> >>>or undo anything.
> >>>
> >>
> >>There will be no git conflicts, as the affected files are disjoint.
> >
> >I regularly resolve semantic conflicts in merge commits - or in the first followup
> >commit.
> >
> 
> But I am guessing that neither you, nor Linus, regularly build MIPS
> kernels with GCC-4.5.x *and* jump label support enabled.  [...]

I build MIPS defconfig kernels at least once per day - so at least serious, 
wide-ranging issues should not slip through. Rarer combos possibly - but that's true 
of pretty much anything.

> [...] So how would such semantic conflict ever be detected?  I would expect the 
> conflict to first occur when Linus pulls Ralf's tree.

If that slips through then a fix is queued up?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 17:15   ` Frederic Weisbecker
  2011-01-05 17:46     ` Steven Rostedt
@ 2011-01-05 21:14     ` Jason Baron
  1 sibling, 0 replies; 113+ messages in thread
From: Jason Baron @ 2011-01-05 21:14 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: peterz, mathieu.desnoyers, hpa, rostedt, mingo, tglx, andi,
	roland, rth, masami.hiramatsu.pt, avi, davem, sam, ddaney,
	michael, linux-kernel

On Wed, Jan 05, 2011 at 06:15:18PM +0100, Frederic Weisbecker wrote:
> On Wed, Jan 05, 2011 at 10:43:12AM -0500, Jason Baron wrote:
> > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> > index 152f7de..0ad9c2e 100644
> > --- a/include/linux/jump_label.h
> > +++ b/include/linux/jump_label.h
> > @@ -22,6 +22,11 @@ struct module;
> >  
> >  #ifdef HAVE_JUMP_LABEL
> >  
> > +static __always_inline bool static_branch(struct jump_label_key *key)
> > +{
> > +	return __static_branch(key);
> 
> Not very important, but __static_branch() would be more self-explained
> if it was called arch_static_branch().
> 
> > +}
> > +
> >  extern struct jump_entry __start___jump_table[];
> >  extern struct jump_entry __stop___jump_table[];
> >  
> > @@ -42,11 +47,12 @@ struct jump_label_key {
> >  	int state;
> >  };
> >  
> > -#define JUMP_LABEL(key, label)			\
> > -do {						\
> > -	if (unlikely(((struct jump_label_key *)key)->state))		\
> > -		goto label;			\
> > -} while (0)
> > +static __always_inline bool static_branch(struct jump_label_key *key)
> > +{
> > +	if (unlikely(key->state))
> > +		return true;
> > +	return false;
> > +}
> >  
> >  static inline int jump_label_enabled(struct jump_label_key *key)
> >  {
> > @@ -78,14 +84,4 @@ static inline void jump_label_unlock(void) {}
> >  
> >  #endif
> >  
> > -#define COND_STMT(key, stmt)					\
> > -do {								\
> > -	__label__ jl_enabled;					\
> > -	JUMP_LABEL_ELSE_ATOMIC_READ(key, jl_enabled);		\
> > -	if (0) {						\
> > -jl_enabled:							\
> > -		stmt;						\
> > -	}							\
> > -} while (0)
> > -
> >  #endif
> > diff --git a/include/linux/jump_label_ref.h b/include/linux/jump_label_ref.h
> > index 8a76e89..5178696 100644
> > --- a/include/linux/jump_label_ref.h
> > +++ b/include/linux/jump_label_ref.h
> > @@ -7,19 +7,23 @@
> >  struct jump_label_key_counter {
> >  	atomic_t ref;
> >  	struct jump_label_key key;
> > -}
> > +};
> >  
> >  #ifdef HAVE_JUMP_LABEL
> >  
> > -#define JUMP_LABEL_ELSE_ATOMIC_READ(key, label, counter) JUMP_LABEL(key, label)
> > +static __always_inline bool static_branch_else_atomic_read(struct jump_label_key *key, atomic_t *count)
> > +{
> > +	return __static_branch(key);
> > +}
> 
> How about having only static_branch() but the key would be handled only
> by ways of get()/put().
> 
> Simple boolean key enablement would work in this scheme as well as branches
> based on refcount. So that the users could avoid maintaining both key and count,
> this would be transparently handled by the jump label API.
> 
> Or am I missing something?
> 

right. this is a good point. I had a 'jump_label_inc()',
'jump_label_dec()' essentially providing this. However, when jump labels
are disabled we didn't want to incur an atomic_read() everywhere.
Furthermore, the use of the atomic_t type within jump_label.h, causes
#include dependencies problems, since atomic.h ends up including
jump_label.h...

Thus, what I've proposed here, is to the have the very simple
jump_label_enable()/disable(), and leave reference counting to the
caller.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 17:32   ` David Daney
  2011-01-05 17:43     ` Steven Rostedt
@ 2011-01-05 21:16     ` Jason Baron
  1 sibling, 0 replies; 113+ messages in thread
From: Jason Baron @ 2011-01-05 21:16 UTC (permalink / raw)
  To: David Daney
  Cc: Ralf Baechle, peterz, mathieu.desnoyers, hpa, rostedt, mingo,
	tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi,
	davem, sam, michael, linux-kernel

On Wed, Jan 05, 2011 at 09:32:11AM -0800, David Daney wrote:
> On 01/05/2011 07:43 AM, Jason Baron wrote:
>> Introduce:
>>
>> static __always_inline bool static_branch(struct jump_label_key *key)
>>
>> to replace the old JUMP_LABEL(key, label) macro.
>>
>> The new static_branch(), simplifies the usage of jump labels. Since,
>> static_branch() returns a boolean, it can be used as part of an if()
>> construct. It also, allows us to drop the 'label' argument from the
>> prototype. Its probably best understood with an example, here is the part
>> of the patch that converts the tracepoints to use unlikely_switch():
>>
>> --- a/include/linux/tracepoint.h
>> +++ b/include/linux/tracepoint.h
>> @@ -146,9 +146,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
>>   	extern struct tracepoint __tracepoint_##name;			\
>>   	static inline void trace_##name(proto)				\
>>   	{								\
>> -		JUMP_LABEL(&__tracepoint_##name.key, do_trace);		\
>> -		return;							\
>> -do_trace:								\
>> +		if (static_branch(&__tracepoint_##name.key))		\
>>   			__DO_TRACE(&__tracepoint_##name,		\
>>   				TP_PROTO(data_proto),			\
>>   				TP_ARGS(data_args));			\
>>
>>
>> I analyzed the code produced by static_branch(), and it seems to be
>> at least as good as the code generated by the JUMP_LABEL(). As a reminder,
>> we get a single nop in the fastpath for -02. But will often times get
>> a 'double jmp' in the -Os case. That is, 'jmp 0', followed by a jmp around
>> the disabled code. We believe that future gcc tweaks to allow block
>> re-ordering in the -Os, will solve the -Os case in the future.
>>
>> I also saw a 1-2% tbench throughput improvement when compiling with
>> jump labels.
>>
>> This patch also addresses a build issue that Tetsuo Handa reported where
>> gcc v3.3 currently chokes on compiling 'dynamic debug':
>>
>> include/net/inet_connection_sock.h: In function `inet_csk_reset_xmit_timer':
>> include/net/inet_connection_sock.h:236: error: duplicate label declaration `do_printk'
>> include/net/inet_connection_sock.h:219: error: this is a previous declaration
>> include/net/inet_connection_sock.h:236: error: duplicate label declaration `out'
>> include/net/inet_connection_sock.h:219: error: this is a previous declaration
>> include/net/inet_connection_sock.h:236: error: duplicate label `do_printk'
>> include/net/inet_connection_sock.h:236: error: duplicate label `out'
>>
>>
>> Thanks to H. Peter Anvin for suggesting this improved syntax.
>>
>> Suggested-by: H. Peter Anvin<hpa@linux.intel.com>
>> Signed-off-by: Jason Baron<jbaron@redhat.com>
>> Tested-by: Tetsuo Handa<penguin-kernel@i-love.sakura.ne.jp>
>> ---
>>   arch/sparc/include/asm/jump_label.h |   25 ++++++++++++++-----------
>>   arch/x86/include/asm/jump_label.h   |   22 +++++++++++++---------
>>   arch/x86/kernel/jump_label.c        |    2 +-
>>   include/linux/dynamic_debug.h       |   18 ++++--------------
>>   include/linux/jump_label.h          |   26 +++++++++++---------------
>>   include/linux/jump_label_ref.h      |   18 +++++++++++-------
>>   include/linux/perf_event.h          |   26 +++++++++++++-------------
>>   include/linux/tracepoint.h          |    4 +---
>>   kernel/jump_label.c                 |    2 +-
>>   9 files changed, 69 insertions(+), 74 deletions(-)
>>
> [...]
>
> This patch will conflict with the MIPS jump label support that Ralf has  
> queued up for 2.6.38.
>
> David Daney

indeed. If you look at the x86 or sparc bits the fixup should be quite
small. The bulk of the changes are in the common code.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/2] jump label: make enable/disable o(1)
  2011-01-05 17:31   ` Steven Rostedt
@ 2011-01-05 21:19     ` Jason Baron
  0 siblings, 0 replies; 113+ messages in thread
From: Jason Baron @ 2011-01-05 21:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: peterz, mathieu.desnoyers, hpa, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Wed, Jan 05, 2011 at 12:31:05PM -0500, Steven Rostedt wrote:
> On Wed, 2011-01-05 at 10:43 -0500, Jason Baron wrote:
> >  
> > +struct jump_label_key {
> > +	int state;
> > +};
> > +
> >  #define JUMP_LABEL(key, label)			\
> >  do {						\
> > -	if (unlikely(*key))			\
> > +	if (unlikely(((struct jump_label_key *)key)->state))		\
> >  		goto label;			\
> >  } while (0)
> 
> Anything that uses JUMP_LABEL() should pass in a pointer to a struct
> jump_label_key. Hence, remove the typecast. That can only lead to hard
> to find bugs.
> 
> -- Steve
> 
> 

right. The second patch in the series converts the  JUMP_LABEL() macro -> 
static __always_inline bool static_branch(struct jump_label_key *key).

So, that addresses this concern.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 18:52       ` H. Peter Anvin
@ 2011-01-05 21:19         ` Jason Baron
  0 siblings, 0 replies; 113+ messages in thread
From: Jason Baron @ 2011-01-05 21:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Rostedt, Frederic Weisbecker, peterz, mathieu.desnoyers,
	mingo, tglx, andi, roland, rth, masami.hiramatsu.pt, avi, davem,
	sam, ddaney, michael, linux-kernel

On Wed, Jan 05, 2011 at 10:52:05AM -0800, H. Peter Anvin wrote:
> On 01/05/2011 09:46 AM, Steven Rostedt wrote:
> > On Wed, 2011-01-05 at 18:15 +0100, Frederic Weisbecker wrote:
> >> On Wed, Jan 05, 2011 at 10:43:12AM -0500, Jason Baron wrote:
> >>> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> >>> index 152f7de..0ad9c2e 100644
> >>> --- a/include/linux/jump_label.h
> >>> +++ b/include/linux/jump_label.h
> >>> @@ -22,6 +22,11 @@ struct module;
> >>>  
> >>>  #ifdef HAVE_JUMP_LABEL
> >>>  
> >>> +static __always_inline bool static_branch(struct jump_label_key *key)
> >>> +{
> >>> +	return __static_branch(key);
> >>
> >> Not very important, but __static_branch() would be more self-explained
> >> if it was called arch_static_branch().
> > 
> > I disagree, I think it is very important ;-)
> > 
> > Yes, the kernel has been moving to adding "arch_" to functions that are
> > implemented dependently by different archs. Please change this to
> > "arch_static_branch()".
> > 
> 
> Indeed.  This hugely simplifies knowing where to look and whose
> responsibility it is.
> 
> 	-hpa

agreed. updated.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 2/2] jump label: introduce static_branch()
  2011-01-05 15:43 ` [PATCH 2/2] jump label: introduce static_branch() Jason Baron
                     ` (2 preceding siblings ...)
  2011-01-05 17:41   ` Steven Rostedt
@ 2011-01-09 18:48   ` Mathieu Desnoyers
  3 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-01-09 18:48 UTC (permalink / raw)
  To: Jason Baron
  Cc: peterz, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

* Jason Baron (jbaron@redhat.com) wrote:
> Introduce:
> 
> static __always_inline bool static_branch(struct jump_label_key *key)
> 
> to replace the old JUMP_LABEL(key, label) macro.
> 
> The new static_branch(), simplifies the usage of jump labels. Since,
> static_branch() returns a boolean, it can be used as part of an if()
> construct. It also, allows us to drop the 'label' argument from the
> prototype. Its probably best understood with an example, here is the part
> of the patch that converts the tracepoints to use unlikely_switch():

small nit:

  s/unlikely_switch/arch_static_branch/g

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-01-05 15:43 [PATCH 0/2] jump label: 2.6.38 updates Jason Baron
  2011-01-05 15:43 ` [PATCH 1/2] jump label: make enable/disable o(1) Jason Baron
  2011-01-05 15:43 ` [PATCH 2/2] jump label: introduce static_branch() Jason Baron
@ 2011-02-11 19:25 ` Peter Zijlstra
  2011-02-11 21:13   ` Mathieu Desnoyers
       [not found]   ` <BLU0-SMTP101B686C32E10BA346B15F896EF0@phx.gbl>
  2 siblings, 2 replies; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-11 19:25 UTC (permalink / raw)
  To: Jason Baron
  Cc: mathieu.desnoyers, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Wed, 2011-01-05 at 10:43 -0500, Jason Baron wrote:
> Hi,
> 
> The first patch uses the storage space of the jump label key address
> as a pointer into the update table. In this way, we can find all
> the addresses that need to be updated without hashing.
> 
> The second patch introduces:
> 
> static __always_inline bool static_branch(struct jump_label_key *key);
> 
> instead of the old JUMP_LABEL(key, label) macro.
> 
> In this way, jump labels become really easy to use:
> 
> Define:
> 
>         struct jump_label_key jump_key;
> 
> Can be used as:
> 
>         if (static_branch(&jump_key))
>                 do unlikely code
> 
> enable/disale via:
> 
>         jump_label_enable(&jump_key);
>         jump_label_disable(&jump_key);
> 
> that's it!
> 
> For perf, which also uses jump labels, I've left the reference counting
> out of the jump label layer, thus removing the 'jump_label_inc()' and
> 'jump_label_dec()' interface. Hopefully, this is a more palatable solution.

Right, lets go with this. Maybe we'll manage to come up with something
saner than _else_atomic_read(), but for now its an improvement over what
we have.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-11 19:25 ` [PATCH 0/2] jump label: 2.6.38 updates Peter Zijlstra
@ 2011-02-11 21:13   ` Mathieu Desnoyers
       [not found]   ` <BLU0-SMTP101B686C32E10BA346B15F896EF0@phx.gbl>
  1 sibling, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-11 21:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Baron, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Wed, 2011-01-05 at 10:43 -0500, Jason Baron wrote:
> > Hi,
> > 
> > The first patch uses the storage space of the jump label key address
> > as a pointer into the update table. In this way, we can find all
> > the addresses that need to be updated without hashing.
> > 
> > The second patch introduces:
> > 
> > static __always_inline bool static_branch(struct jump_label_key *key);
> > 
> > instead of the old JUMP_LABEL(key, label) macro.
> > 
> > In this way, jump labels become really easy to use:
> > 
> > Define:
> > 
> >         struct jump_label_key jump_key;
> > 
> > Can be used as:
> > 
> >         if (static_branch(&jump_key))
> >                 do unlikely code
> > 
> > enable/disale via:
> > 
> >         jump_label_enable(&jump_key);
> >         jump_label_disable(&jump_key);
> > 
> > that's it!
> > 
> > For perf, which also uses jump labels, I've left the reference counting
> > out of the jump label layer, thus removing the 'jump_label_inc()' and
> > 'jump_label_dec()' interface. Hopefully, this is a more palatable solution.
> 
> Right, lets go with this. Maybe we'll manage to come up with something
> saner than _else_atomic_read(), but for now its an improvement over what
> we have.

I agree that keeping jump_label.h with the minimal clean API is a good
goal, and this patchset is almost there (maybe except for the
_else_atomic_read() part).

Hrm, given that the atomic inc/dec return and test for 1/0 is moved into
the Perf code, I wonder if it would make sense to move the
"_else_atomic_read()" oddness into the perf code too ? Perf could
declare, in its own header, a wrapper over __static_branch, e.g. put in
perf_event.h:

#ifdef HAVE_JUMP_LABEL
static __always_inline
bool perf_sw_event_static_branch_refcount(struct jump_label_key *key,
                                          atomic_t *ref)
{
	return __static_branch(key);
}
#else
static __always_inline
bool perf_sw_event_static_branch_refcount(struct jump_label_key *key,
                                          atomic_t *ref)
{
	if (unlikely(atomic_read(ref)))
		return true;
	return false;
}
#endif

Otherwise, jump_label_ref.h looks like an half-baked interface that only
provides the "test" API but not the ref/unref. If we have only a single
user interested in refcounting, it might make more sense to put the code
in perf_event.h. If we have many users using an atomic refcount like
this, then we should extend jump_label_ref.h to also provide the
ref/unref API too. I don't care much about where it ends up, as long as
it's a consistent choice.

Thoughts ?

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]   ` <BLU0-SMTP101B686C32E10BA346B15F896EF0@phx.gbl>
@ 2011-02-11 21:38     ` Peter Zijlstra
  2011-02-11 22:15       ` Jason Baron
                         ` (3 more replies)
  0 siblings, 4 replies; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-11 21:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jason Baron, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Fri, 2011-02-11 at 16:13 -0500, Mathieu Desnoyers wrote:
> 
> Thoughts ? 

 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
+
+struct jump_label_key {
+       void *ptr;
+};

 struct jump_label_entry {
        struct hlist_node hlist;
        struct jump_entry *table;
-       int nr_entries;
        /* hang modules off here */
        struct hlist_head modules;
        unsigned long key;
+       u32 nr_entries;
+       int refcount;
 };

#else

+struct jump_label_key {
+       int state;
+};

#endif



So why can't we make that jump_label_entry::refcount and
jump_label_key::state an atomic_t and be done with it?

Then the enabled case uses if (atomic_inc_return(&key->ptr->refcount) ==
1), and the disabled atomic_inc(&key->state).


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-11 21:38     ` Peter Zijlstra
@ 2011-02-11 22:15       ` Jason Baron
  2011-02-11 22:19         ` H. Peter Anvin
  2011-02-11 22:30         ` Mathieu Desnoyers
  2011-02-11 22:20       ` Mathieu Desnoyers
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 113+ messages in thread
From: Jason Baron @ 2011-02-11 22:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Fri, Feb 11, 2011 at 10:38:17PM +0100, Peter Zijlstra wrote:
> On Fri, 2011-02-11 at 16:13 -0500, Mathieu Desnoyers wrote:
> > 
> > Thoughts ? 
> 
>  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
> +
> +struct jump_label_key {
> +       void *ptr;
> +};
> 
>  struct jump_label_entry {
>         struct hlist_node hlist;
>         struct jump_entry *table;
> -       int nr_entries;
>         /* hang modules off here */
>         struct hlist_head modules;
>         unsigned long key;
> +       u32 nr_entries;
> +       int refcount;
>  };
> 
> #else
> 
> +struct jump_label_key {
> +       int state;
> +};
> 
> #endif
> 
> 
> 
> So why can't we make that jump_label_entry::refcount and
> jump_label_key::state an atomic_t and be done with it?
> 
> Then the enabled case uses if (atomic_inc_return(&key->ptr->refcount) ==
> 1), and the disabled atomic_inc(&key->state).
> 

a bit of history...

For the disabled jump label case, we didn't want to incur an atomic_read() to
check if the branch was enabled.

So, I separated the API, to have one for the non-atomic case, and one
for the atomic case. Nobody liked that.

So now, I'm proposing to leave the core API based around a non-atomic
variable, and have any callers that want to use this atomic interface,
to call into the non-atomic interface. If another user besides perf
wants to use the same type of atomic interface, we can re-visit the
decsion? 

thanks,

-Jason

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-11 22:15       ` Jason Baron
@ 2011-02-11 22:19         ` H. Peter Anvin
  2011-02-11 22:30         ` Mathieu Desnoyers
  1 sibling, 0 replies; 113+ messages in thread
From: H. Peter Anvin @ 2011-02-11 22:19 UTC (permalink / raw)
  To: Jason Baron
  Cc: Peter Zijlstra, Mathieu Desnoyers, rostedt, mingo, tglx, andi,
	roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem, sam,
	ddaney, michael, linux-kernel

On 02/11/2011 02:15 PM, Jason Baron wrote:
> 
> a bit of history...
> 
> For the disabled jump label case, we didn't want to incur an atomic_read() to
> check if the branch was enabled.
> 
> So, I separated the API, to have one for the non-atomic case, and one
> for the atomic case. Nobody liked that.
> 
> So now, I'm proposing to leave the core API based around a non-atomic
> variable, and have any callers that want to use this atomic interface,
> to call into the non-atomic interface. If another user besides perf
> wants to use the same type of atomic interface, we can re-visit the
> decsion? 
> 

What is the problem with taking the atomic_read()?

	-hpa

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-11 21:38     ` Peter Zijlstra
  2011-02-11 22:15       ` Jason Baron
@ 2011-02-11 22:20       ` Mathieu Desnoyers
       [not found]       ` <BLU0-SMTP8562BA758CF8AAE5323AE296EF0@phx.gbl>
  2011-02-12 18:47       ` Peter Zijlstra
  3 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-11 22:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Baron, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Fri, 2011-02-11 at 16:13 -0500, Mathieu Desnoyers wrote:
> > 
> > Thoughts ? 
> 
>  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
> +
> +struct jump_label_key {
> +       void *ptr;
> +};
> 
>  struct jump_label_entry {
>         struct hlist_node hlist;
>         struct jump_entry *table;
> -       int nr_entries;
>         /* hang modules off here */
>         struct hlist_head modules;
>         unsigned long key;
> +       u32 nr_entries;
> +       int refcount;
>  };
> 
> #else
> 
> +struct jump_label_key {
> +       int state;
> +};
> 
> #endif
> 
> So why can't we make that jump_label_entry::refcount and
> jump_label_key::state an atomic_t and be done with it?
> 
> Then the enabled case uses if (atomic_inc_return(&key->ptr->refcount) ==
> 1), and the disabled atomic_inc(&key->state).
> 

OK, by "enabled" you mean #if defined(CC_HAVE_ASM_GOTO) &&
defined(CONFIG_JUMP_LABEL), and "disabled", the #else.

I guess the only downside is the extra volatile for the atomic_read for
the fallback case, which is not really much of problem realistically
speaking: anyway, the volatile is a good thing to have in the fallback
case to force the compiler to re-read the variable. Let's go with your
idea.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]       ` <BLU0-SMTP8562BA758CF8AAE5323AE296EF0@phx.gbl>
@ 2011-02-11 22:27         ` Jason Baron
  2011-02-11 22:32           ` Mathieu Desnoyers
  0 siblings, 1 reply; 113+ messages in thread
From: Jason Baron @ 2011-02-11 22:27 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Fri, Feb 11, 2011 at 05:20:25PM -0500, Mathieu Desnoyers wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Fri, 2011-02-11 at 16:13 -0500, Mathieu Desnoyers wrote:
> > > 
> > > Thoughts ? 
> > 
> >  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
> > +
> > +struct jump_label_key {
> > +       void *ptr;
> > +};
> > 
> >  struct jump_label_entry {
> >         struct hlist_node hlist;
> >         struct jump_entry *table;
> > -       int nr_entries;
> >         /* hang modules off here */
> >         struct hlist_head modules;
> >         unsigned long key;
> > +       u32 nr_entries;
> > +       int refcount;
> >  };
> > 
> > #else
> > 
> > +struct jump_label_key {
> > +       int state;
> > +};
> > 
> > #endif
> > 
> > So why can't we make that jump_label_entry::refcount and
> > jump_label_key::state an atomic_t and be done with it?
> > 
> > Then the enabled case uses if (atomic_inc_return(&key->ptr->refcount) ==
> > 1), and the disabled atomic_inc(&key->state).
> > 
> 
> OK, by "enabled" you mean #if defined(CC_HAVE_ASM_GOTO) &&
> defined(CONFIG_JUMP_LABEL), and "disabled", the #else.
> 
> I guess the only downside is the extra volatile for the atomic_read for
> the fallback case, which is not really much of problem realistically
> speaking: anyway, the volatile is a good thing to have in the fallback
> case to force the compiler to re-read the variable. Let's go with your
> idea.
> 
> Thanks,
> 
> Mathieu
> 

ok, I'll try and re-spin the interface based around atomic_t, if we are all
agreed...there was also a circular dependency issue with atomic.h including
kernel.h which included jump_label.h, and that was why we had a separate,
jump_label_ref.h header file, but hopefully I can be resolve that in a clean
way.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-11 22:15       ` Jason Baron
  2011-02-11 22:19         ` H. Peter Anvin
@ 2011-02-11 22:30         ` Mathieu Desnoyers
  1 sibling, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-11 22:30 UTC (permalink / raw)
  To: Jason Baron
  Cc: Peter Zijlstra, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

* Jason Baron (jbaron@redhat.com) wrote:
> On Fri, Feb 11, 2011 at 10:38:17PM +0100, Peter Zijlstra wrote:
> > On Fri, 2011-02-11 at 16:13 -0500, Mathieu Desnoyers wrote:
> > > 
> > > Thoughts ? 
> > 
> >  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
> > +
> > +struct jump_label_key {
> > +       void *ptr;
> > +};
> > 
> >  struct jump_label_entry {
> >         struct hlist_node hlist;
> >         struct jump_entry *table;
> > -       int nr_entries;
> >         /* hang modules off here */
> >         struct hlist_head modules;
> >         unsigned long key;
> > +       u32 nr_entries;
> > +       int refcount;
> >  };
> > 
> > #else
> > 
> > +struct jump_label_key {
> > +       int state;
> > +};
> > 
> > #endif
> > 
> > 
> > 
> > So why can't we make that jump_label_entry::refcount and
> > jump_label_key::state an atomic_t and be done with it?
> > 
> > Then the enabled case uses if (atomic_inc_return(&key->ptr->refcount) ==
> > 1), and the disabled atomic_inc(&key->state).
> > 
> 
> a bit of history...
> 
> For the disabled jump label case, we didn't want to incur an atomic_read() to
> check if the branch was enabled.
> 
> So, I separated the API, to have one for the non-atomic case, and one
> for the atomic case. Nobody liked that.
> 
> So now, I'm proposing to leave the core API based around a non-atomic
> variable, and have any callers that want to use this atomic interface,
> to call into the non-atomic interface. If another user besides perf
> wants to use the same type of atomic interface, we can re-visit the
> decsion? 

See my other email to PeterZ. I think it might be better to keep the
interface really clean and take compiler optimization hit on the
volatile if we figure out that it is negligible. I'd love to see
benchmarks on the impact of this change to justify that we can actually
dismiss the performance impact. We have enough tracepoints in the kernel
that if we figure out that it does not make a noticeable performance
difference in !JUMP_LABEL configs with tracepoints enabled, we can as
well take the volatile. But please document these benchmarks in the
patch changelog. Also looking at the disassembly of core instrumented
kernel functions to see if the added volatile hurts the basic block
ordering, and documenting that, would be helpful.

I'd recommend a jump_label_ref()/jump_label_unref() interface (similar
to kref) intead of enable/disable through (to make it clear that we have
reference counter handling in there).

Long story short: I'm not against adding the volatile read here. I'm
against adding it without measuring and documenting the impact of this
change.

Thanks,

Mathieu

> 
> thanks,
> 
> -Jason

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-11 22:27         ` Jason Baron
@ 2011-02-11 22:32           ` Mathieu Desnoyers
  0 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-11 22:32 UTC (permalink / raw)
  To: Jason Baron
  Cc: Peter Zijlstra, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

* Jason Baron (jbaron@redhat.com) wrote:
> On Fri, Feb 11, 2011 at 05:20:25PM -0500, Mathieu Desnoyers wrote:
> > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > On Fri, 2011-02-11 at 16:13 -0500, Mathieu Desnoyers wrote:
> > > > 
> > > > Thoughts ? 
> > > 
> > >  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
> > > +
> > > +struct jump_label_key {
> > > +       void *ptr;
> > > +};
> > > 
> > >  struct jump_label_entry {
> > >         struct hlist_node hlist;
> > >         struct jump_entry *table;
> > > -       int nr_entries;
> > >         /* hang modules off here */
> > >         struct hlist_head modules;
> > >         unsigned long key;
> > > +       u32 nr_entries;
> > > +       int refcount;
> > >  };
> > > 
> > > #else
> > > 
> > > +struct jump_label_key {
> > > +       int state;
> > > +};
> > > 
> > > #endif
> > > 
> > > So why can't we make that jump_label_entry::refcount and
> > > jump_label_key::state an atomic_t and be done with it?
> > > 
> > > Then the enabled case uses if (atomic_inc_return(&key->ptr->refcount) ==
> > > 1), and the disabled atomic_inc(&key->state).
> > > 
> > 
> > OK, by "enabled" you mean #if defined(CC_HAVE_ASM_GOTO) &&
> > defined(CONFIG_JUMP_LABEL), and "disabled", the #else.
> > 
> > I guess the only downside is the extra volatile for the atomic_read for
> > the fallback case, which is not really much of problem realistically
> > speaking: anyway, the volatile is a good thing to have in the fallback
> > case to force the compiler to re-read the variable. Let's go with your
> > idea.
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> 
> ok, I'll try and re-spin the interface based around atomic_t, if we are all
> agreed...there was also a circular dependency issue with atomic.h including
> kernel.h which included jump_label.h, and that was why we had a separate,
> jump_label_ref.h header file, but hopefully I can be resolve that in a clean
> way.

See spinlocks ?

jump_label_types.h (structure definitions, includes types.h,
                    included from kernel.h)
jump_label.h (prototypes, inline functions, includes atomic.h)

Thanks,

Mathieu

> 
> thanks,
> 
> -Jason

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-11 21:38     ` Peter Zijlstra
                         ` (2 preceding siblings ...)
       [not found]       ` <BLU0-SMTP8562BA758CF8AAE5323AE296EF0@phx.gbl>
@ 2011-02-12 18:47       ` Peter Zijlstra
  2011-02-14 12:27         ` Ingo Molnar
                           ` (2 more replies)
  3 siblings, 3 replies; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-12 18:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jason Baron, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Fri, 2011-02-11 at 22:38 +0100, Peter Zijlstra wrote:
> 
> So why can't we make that jump_label_entry::refcount and
> jump_label_key::state an atomic_t and be done with it? 

So I had a bit of a poke at this because I didn't quite understand why
all that stuff was as it was. I applied both Jason's patches and then
basically rewrote kernel/jump_label.c just for kicks ;-)

I haven't tried compiling this, let alone running it, but provided I
didn't actually forget anything the storage per key is now 16 bytes when
modules are disabled and 24 * (1 + mods) bytes for when they are
enabled. The old code had 64 + 40 * mods bytes.

I still need to clean up the static_branch_else bits and look at !x86
aside from the already mentioned bits.. but what do people think?

---
 arch/sparc/include/asm/jump_label.h |   25 +-
 arch/x86/include/asm/jump_label.h   |   22 +-
 arch/x86/kernel/jump_label.c        |    2 +-
 arch/x86/kernel/module.c            |    3 -
 include/linux/dynamic_debug.h       |   10 +-
 include/linux/jump_label.h          |   71 +++---
 include/linux/jump_label_ref.h      |   36 +--
 include/linux/module.h              |    1 +
 include/linux/perf_event.h          |   28 +-
 include/linux/tracepoint.h          |    8 +-
 kernel/jump_label.c                 |  516 +++++++++++++----------------------
 kernel/module.c                     |    7 +
 kernel/perf_event.c                 |   30 ++-
 kernel/timer.c                      |    8 +-
 kernel/tracepoint.c                 |   22 +-
 15 files changed, 333 insertions(+), 456 deletions(-)

diff --git a/arch/sparc/include/asm/jump_label.h b/arch/sparc/include/asm/jump_label.h
index 427d468..e4ca085 100644
--- a/arch/sparc/include/asm/jump_label.h
+++ b/arch/sparc/include/asm/jump_label.h
@@ -7,17 +7,20 @@
 
 #define JUMP_LABEL_NOP_SIZE 4
 
-#define JUMP_LABEL(key, label)					\
-	do {							\
-		asm goto("1:\n\t"				\
-			 "nop\n\t"				\
-			 "nop\n\t"				\
-			 ".pushsection __jump_table,  \"a\"\n\t"\
-			 ".align 4\n\t"				\
-			 ".word 1b, %l[" #label "], %c0\n\t"	\
-			 ".popsection \n\t"			\
-			 : :  "i" (key) :  : label);\
-	} while (0)
+static __always_inline bool __static_branch(struct jump_label_key *key)
+{
+		asm goto("1:\n\t"
+			 "nop\n\t"
+			 "nop\n\t"
+			 ".pushsection __jump_table,  \"a\"\n\t"
+			 ".align 4\n\t"
+			 ".word 1b, %l[l_yes], %c0\n\t"
+			 ".popsection \n\t"
+			 : :  "i" (key) :  : l_yes);
+	return false;
+l_yes:
+	return true;
+}
 
 #endif /* __KERNEL__ */
 
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 574dbc2..3d44a7c 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -5,20 +5,24 @@
 
 #include <linux/types.h>
 #include <asm/nops.h>
+#include <asm/asm.h>
 
 #define JUMP_LABEL_NOP_SIZE 5
 
 # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
 
-# define JUMP_LABEL(key, label)					\
-	do {							\
-		asm goto("1:"					\
-			JUMP_LABEL_INITIAL_NOP			\
-			".pushsection __jump_table,  \"aw\" \n\t"\
-			_ASM_PTR "1b, %l[" #label "], %c0 \n\t" \
-			".popsection \n\t"			\
-			: :  "i" (key) :  : label);		\
-	} while (0)
+static __always_inline bool __static_branch(struct jump_label_key *key)
+{
+	asm goto("1:"
+		JUMP_LABEL_INITIAL_NOP
+		".pushsection __jump_table,  \"a\" \n\t"
+		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
+		".popsection \n\t"
+		: :  "i" (key) : : l_yes );
+	return false;
+l_yes:
+	return true;
+}
 
 #endif /* __KERNEL__ */
 
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 961b6b3..dfa4c3c 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -4,13 +4,13 @@
  * Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
  *
  */
-#include <linux/jump_label.h>
 #include <linux/memory.h>
 #include <linux/uaccess.h>
 #include <linux/module.h>
 #include <linux/list.h>
 #include <linux/jhash.h>
 #include <linux/cpu.h>
+#include <linux/jump_label.h>
 #include <asm/kprobes.h>
 #include <asm/alternative.h>
 
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index ab23f1a..0e6b823 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -230,9 +230,6 @@ int module_finalize(const Elf_Ehdr *hdr,
 		apply_paravirt(pseg, pseg + para->sh_size);
 	}
 
-	/* make jump label nops */
-	jump_label_apply_nops(me);
-
 	return 0;
 }
 
diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index 1c70028..2ade291 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -33,7 +33,7 @@ struct _ddebug {
 #define _DPRINTK_FLAGS_PRINT   (1<<0)  /* printk() a message using the format */
 #define _DPRINTK_FLAGS_DEFAULT 0
 	unsigned int flags:8;
-	char enabled;
+	struct jump_label_key enabled;
 } __attribute__((aligned(8)));
 
 
@@ -48,8 +48,8 @@ extern int ddebug_remove_module(const char *mod_name);
 	__used								\
 	__attribute__((section("__verbose"), aligned(8))) =		\
 	{ KBUILD_MODNAME, __func__, __FILE__, fmt, __LINE__,		\
-		_DPRINTK_FLAGS_DEFAULT };				\
-	if (unlikely(descriptor.enabled))				\
+		_DPRINTK_FLAGS_DEFAULT, JUMP_LABEL_INIT };		\
+	if (static_branch(&descriptor.enabled))				\
 		printk(KERN_DEBUG pr_fmt(fmt),	##__VA_ARGS__);		\
 	} while (0)
 
@@ -59,8 +59,8 @@ extern int ddebug_remove_module(const char *mod_name);
 	__used								\
 	__attribute__((section("__verbose"), aligned(8))) =		\
 	{ KBUILD_MODNAME, __func__, __FILE__, fmt, __LINE__,		\
-		_DPRINTK_FLAGS_DEFAULT };				\
-	if (unlikely(descriptor.enabled))				\
+		_DPRINTK_FLAGS_DEFAULT, JUMP_LABEL_INIT };		\
+	if (static_branch(&descriptor.enabled))				\
 		dev_printk(KERN_DEBUG, dev, fmt, ##__VA_ARGS__);	\
 	} while (0)
 
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 7880f18..a1cec0a 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -2,19 +2,35 @@
 #define _LINUX_JUMP_LABEL_H
 
 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
+
+struct jump_label_key {
+	atomic_t enabled;
+	struct jump_entry *entries;
+#ifdef CONFIG_MODULES
+	struct jump_module *next;
+#endif
+};
+
 # include <asm/jump_label.h>
 # define HAVE_JUMP_LABEL
 #endif
 
 enum jump_label_type {
+	JUMP_LABEL_DISABLE = 0,
 	JUMP_LABEL_ENABLE,
-	JUMP_LABEL_DISABLE
 };
 
 struct module;
 
+#define JUMP_LABEL_INIT { 0 }
+
 #ifdef HAVE_JUMP_LABEL
 
+static __always_inline bool static_branch(struct jump_label_key *key)
+{
+	return __static_branch(key);
+}
+
 extern struct jump_entry __start___jump_table[];
 extern struct jump_entry __stop___jump_table[];
 
@@ -23,37 +39,31 @@ extern void jump_label_unlock(void);
 extern void arch_jump_label_transform(struct jump_entry *entry,
 				 enum jump_label_type type);
 extern void arch_jump_label_text_poke_early(jump_label_t addr);
-extern void jump_label_update(unsigned long key, enum jump_label_type type);
-extern void jump_label_apply_nops(struct module *mod);
 extern int jump_label_text_reserved(void *start, void *end);
-
-#define jump_label_enable(key) \
-	jump_label_update((unsigned long)key, JUMP_LABEL_ENABLE);
-
-#define jump_label_disable(key) \
-	jump_label_update((unsigned long)key, JUMP_LABEL_DISABLE);
+extern void jump_label_enable(struct jump_label_key *key);
+extern void jump_label_disable(struct jump_label_key *key);
 
 #else
 
-#define JUMP_LABEL(key, label)			\
-do {						\
-	if (unlikely(*key))			\
-		goto label;			\
-} while (0)
+struct jump_label_key {
+	atomic_t enabled;
+};
 
-#define jump_label_enable(cond_var)	\
-do {					\
-       *(cond_var) = 1;			\
-} while (0)
+static __always_inline bool static_branch(struct jump_label_key *key)
+{
+	if (unlikely(atomic_read(&key->state)))
+		return true;
+	return false;
+}
 
-#define jump_label_disable(cond_var)	\
-do {					\
-       *(cond_var) = 0;			\
-} while (0)
+static inline void jump_label_enable(struct jump_label_key *key)
+{
+	atomic_inc(&key->state);
+}
 
-static inline int jump_label_apply_nops(struct module *mod)
+static inline void jump_label_disable(struct jump_label_key *key)
 {
-	return 0;
+	atomic_dec(&key->state);
 }
 
 static inline int jump_label_text_reserved(void *start, void *end)
@@ -66,14 +76,9 @@ static inline void jump_label_unlock(void) {}
 
 #endif
 
-#define COND_STMT(key, stmt)					\
-do {								\
-	__label__ jl_enabled;					\
-	JUMP_LABEL(key, jl_enabled);				\
-	if (0) {						\
-jl_enabled:							\
-		stmt;						\
-	}							\
-} while (0)
+static inline bool jump_label_enabled(struct jump_label_key *key)
+{
+	return !!atomic_read(&key->state);
+}
 
 #endif
diff --git a/include/linux/jump_label_ref.h b/include/linux/jump_label_ref.h
index e5d012a..5178696 100644
--- a/include/linux/jump_label_ref.h
+++ b/include/linux/jump_label_ref.h
@@ -4,41 +4,27 @@
 #include <linux/jump_label.h>
 #include <asm/atomic.h>
 
-#ifdef HAVE_JUMP_LABEL
+struct jump_label_key_counter {
+	atomic_t ref;
+	struct jump_label_key key;
+};
 
-static inline void jump_label_inc(atomic_t *key)
-{
-	if (atomic_add_return(1, key) == 1)
-		jump_label_enable(key);
-}
+#ifdef HAVE_JUMP_LABEL
 
-static inline void jump_label_dec(atomic_t *key)
+static __always_inline bool static_branch_else_atomic_read(struct jump_label_key *key, atomic_t *count)
 {
-	if (atomic_dec_and_test(key))
-		jump_label_disable(key);
+	return __static_branch(key);
 }
 
 #else /* !HAVE_JUMP_LABEL */
 
-static inline void jump_label_inc(atomic_t *key)
+static __always_inline bool static_branch_else_atomic_read(struct jump_label_key *key, atomic_t *count)
 {
-	atomic_inc(key);
+	if (unlikely(atomic_read(count)))
+		return true;
+	return false;
 }
 
-static inline void jump_label_dec(atomic_t *key)
-{
-	atomic_dec(key);
-}
-
-#undef JUMP_LABEL
-#define JUMP_LABEL(key, label)						\
-do {									\
-	if (unlikely(__builtin_choose_expr(				\
-	      __builtin_types_compatible_p(typeof(key), atomic_t *),	\
-	      atomic_read((atomic_t *)(key)), *(key))))			\
-		goto label;						\
-} while (0)
-
 #endif /* HAVE_JUMP_LABEL */
 
 #endif /* _LINUX_JUMP_LABEL_REF_H */
diff --git a/include/linux/module.h b/include/linux/module.h
index 9bdf27c..eeb3e99 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -266,6 +266,7 @@ enum module_state
 	MODULE_STATE_LIVE,
 	MODULE_STATE_COMING,
 	MODULE_STATE_GOING,
+	MODULE_STATE_POST_RELOCATE,
 };
 
 struct module
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index dda5b0a..26fe115 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1000,7 +1000,7 @@ static inline int is_software_event(struct perf_event *event)
 	return event->pmu->task_ctx_nr == perf_sw_context;
 }
 
-extern atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
+extern struct jump_label_key_counter perf_swevent_enabled[PERF_COUNT_SW_MAX];
 
 extern void __perf_sw_event(u32, u64, int, struct pt_regs *, u64);
 
@@ -1029,30 +1029,32 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
 {
 	struct pt_regs hot_regs;
 
-	JUMP_LABEL(&perf_swevent_enabled[event_id], have_event);
-	return;
-
-have_event:
-	if (!regs) {
-		perf_fetch_caller_regs(&hot_regs);
-		regs = &hot_regs;
+	if (static_branch_else_atomic_read(&perf_swevent_enabled[event_id].key,
+					 &perf_swevent_enabled[event_id].ref)) {
+		if (!regs) {
+			perf_fetch_caller_regs(&hot_regs);
+			regs = &hot_regs;
+		}
+		__perf_sw_event(event_id, nr, nmi, regs, addr);
 	}
-	__perf_sw_event(event_id, nr, nmi, regs, addr);
 }
 
-extern atomic_t perf_task_events;
+extern struct jump_label_key_counter perf_task_events;
 
 static inline void perf_event_task_sched_in(struct task_struct *task)
 {
-	COND_STMT(&perf_task_events, __perf_event_task_sched_in(task));
+	if (static_branch_else_atomic_read(&perf_task_events.key,
+					   &perf_task_events.ref))
+		__perf_event_task_sched_in(task);
 }
 
 static inline
 void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
 {
 	perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
-
-	COND_STMT(&perf_task_events, __perf_event_task_sched_out(task, next));
+	if (static_branch_else_atomic_read(&perf_task_events.key,
+					   &perf_task_events.ref))
+		__perf_event_task_sched_out(task, next);
 }
 
 extern void perf_event_mmap(struct vm_area_struct *vma);
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 97c84a5..6c8c747 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -29,7 +29,7 @@ struct tracepoint_func {
 
 struct tracepoint {
 	const char *name;		/* Tracepoint name */
-	int state;			/* State. */
+	struct jump_label_key key;
 	void (*regfunc)(void);
 	void (*unregfunc)(void);
 	struct tracepoint_func __rcu *funcs;
@@ -146,9 +146,7 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
 	extern struct tracepoint __tracepoint_##name;			\
 	static inline void trace_##name(proto)				\
 	{								\
-		JUMP_LABEL(&__tracepoint_##name.state, do_trace);	\
-		return;							\
-do_trace:								\
+		if (static_branch(&__tracepoint_##name.key))		\
 			__DO_TRACE(&__tracepoint_##name,		\
 				TP_PROTO(data_proto),			\
 				TP_ARGS(data_args),			\
@@ -181,7 +179,7 @@ do_trace:								\
 	__attribute__((section("__tracepoints_strings"))) = #name;	\
 	struct tracepoint __tracepoint_##name				\
 	__attribute__((section("__tracepoints"))) =			\
-		{ __tpstrtab_##name, 0, reg, unreg, NULL };		\
+		{ __tpstrtab_##name, JUMP_LABEL_INIT, reg, unreg, NULL };\
 	static struct tracepoint * const __tracepoint_ptr_##name __used	\
 	__attribute__((section("__tracepoints_ptrs"))) =		\
 		&__tracepoint_##name;
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 3b79bd9..29b34be 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -2,9 +2,9 @@
  * jump label support
  *
  * Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
+ * Copyright (C) 2011 Peter Zijlstra <pzijlstr@redhat.com>
  *
  */
-#include <linux/jump_label.h>
 #include <linux/memory.h>
 #include <linux/uaccess.h>
 #include <linux/module.h>
@@ -13,32 +13,13 @@
 #include <linux/slab.h>
 #include <linux/sort.h>
 #include <linux/err.h>
+#include <linux/jump_label.h>
 
 #ifdef HAVE_JUMP_LABEL
 
-#define JUMP_LABEL_HASH_BITS 6
-#define JUMP_LABEL_TABLE_SIZE (1 << JUMP_LABEL_HASH_BITS)
-static struct hlist_head jump_label_table[JUMP_LABEL_TABLE_SIZE];
-
 /* mutex to protect coming/going of the the jump_label table */
 static DEFINE_MUTEX(jump_label_mutex);
 
-struct jump_label_entry {
-	struct hlist_node hlist;
-	struct jump_entry *table;
-	int nr_entries;
-	/* hang modules off here */
-	struct hlist_head modules;
-	unsigned long key;
-};
-
-struct jump_label_module_entry {
-	struct hlist_node hlist;
-	struct jump_entry *table;
-	int nr_entries;
-	struct module *mod;
-};
-
 void jump_label_lock(void)
 {
 	mutex_lock(&jump_label_mutex);
@@ -64,7 +45,7 @@ static int jump_label_cmp(const void *a, const void *b)
 }
 
 static void
-sort_jump_label_entries(struct jump_entry *start, struct jump_entry *stop)
+jump_label_sort_entries(struct jump_entry *start, struct jump_entry *stop)
 {
 	unsigned long size;
 
@@ -73,118 +54,25 @@ sort_jump_label_entries(struct jump_entry *start, struct jump_entry *stop)
 	sort(start, size, sizeof(struct jump_entry), jump_label_cmp, NULL);
 }
 
-static struct jump_label_entry *get_jump_label_entry(jump_label_t key)
-{
-	struct hlist_head *head;
-	struct hlist_node *node;
-	struct jump_label_entry *e;
-	u32 hash = jhash((void *)&key, sizeof(jump_label_t), 0);
-
-	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
-	hlist_for_each_entry(e, node, head, hlist) {
-		if (key == e->key)
-			return e;
-	}
-	return NULL;
-}
+static void jump_label_update(struct jump_label_key *key, int enable);
 
-static struct jump_label_entry *
-add_jump_label_entry(jump_label_t key, int nr_entries, struct jump_entry *table)
+void jump_label_enable(struct jump_label_key *key)
 {
-	struct hlist_head *head;
-	struct jump_label_entry *e;
-	u32 hash;
-
-	e = get_jump_label_entry(key);
-	if (e)
-		return ERR_PTR(-EEXIST);
-
-	e = kmalloc(sizeof(struct jump_label_entry), GFP_KERNEL);
-	if (!e)
-		return ERR_PTR(-ENOMEM);
-
-	hash = jhash((void *)&key, sizeof(jump_label_t), 0);
-	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
-	e->key = key;
-	e->table = table;
-	e->nr_entries = nr_entries;
-	INIT_HLIST_HEAD(&(e->modules));
-	hlist_add_head(&e->hlist, head);
-	return e;
-}
+	if (atomic_inc_not_zero(&key->enabled))
+		return;
 
-static int
-build_jump_label_hashtable(struct jump_entry *start, struct jump_entry *stop)
-{
-	struct jump_entry *iter, *iter_begin;
-	struct jump_label_entry *entry;
-	int count;
-
-	sort_jump_label_entries(start, stop);
-	iter = start;
-	while (iter < stop) {
-		entry = get_jump_label_entry(iter->key);
-		if (!entry) {
-			iter_begin = iter;
-			count = 0;
-			while ((iter < stop) &&
-				(iter->key == iter_begin->key)) {
-				iter++;
-				count++;
-			}
-			entry = add_jump_label_entry(iter_begin->key,
-							count, iter_begin);
-			if (IS_ERR(entry))
-				return PTR_ERR(entry);
-		 } else {
-			WARN_ONCE(1, KERN_ERR "build_jump_hashtable: unexpected entry!\n");
-			return -1;
-		}
-	}
-	return 0;
+	jump_label_lock();
+	if (atomic_add_return(&key->enabled) == 1)
+		jump_label_update(key, JUMP_LABEL_ENABLE);
+	jump_label_unlock();
 }
 
-/***
- * jump_label_update - update jump label text
- * @key -  key value associated with a a jump label
- * @type - enum set to JUMP_LABEL_ENABLE or JUMP_LABEL_DISABLE
- *
- * Will enable/disable the jump for jump label @key, depending on the
- * value of @type.
- *
- */
-
-void jump_label_update(unsigned long key, enum jump_label_type type)
+void jump_label_disable(struct jump_label_key *key)
 {
-	struct jump_entry *iter;
-	struct jump_label_entry *entry;
-	struct hlist_node *module_node;
-	struct jump_label_module_entry *e_module;
-	int count;
+	if (!atomic_dec_and_mutex_lock(&key->enabled, &jump_label_mutex))
+		return;
 
-	jump_label_lock();
-	entry = get_jump_label_entry((jump_label_t)key);
-	if (entry) {
-		count = entry->nr_entries;
-		iter = entry->table;
-		while (count--) {
-			if (kernel_text_address(iter->code))
-				arch_jump_label_transform(iter, type);
-			iter++;
-		}
-		/* eanble/disable jump labels in modules */
-		hlist_for_each_entry(e_module, module_node, &(entry->modules),
-							hlist) {
-			count = e_module->nr_entries;
-			iter = e_module->table;
-			while (count--) {
-				if (iter->key &&
-						kernel_text_address(iter->code))
-					arch_jump_label_transform(iter, type);
-				iter++;
-			}
-		}
-	}
+	jump_label_update(key, JUMP_LABEL_DISABLE);
 	jump_label_unlock();
 }
 
@@ -197,77 +85,30 @@ static int addr_conflict(struct jump_entry *entry, void *start, void *end)
 	return 0;
 }
 
-#ifdef CONFIG_MODULES
-
-static int module_conflict(void *start, void *end)
-{
-	struct hlist_head *head;
-	struct hlist_node *node, *node_next, *module_node, *module_node_next;
-	struct jump_label_entry *e;
-	struct jump_label_module_entry *e_module;
-	struct jump_entry *iter;
-	int i, count;
-	int conflict = 0;
-
-	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
-		head = &jump_label_table[i];
-		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
-			hlist_for_each_entry_safe(e_module, module_node,
-							module_node_next,
-							&(e->modules), hlist) {
-				count = e_module->nr_entries;
-				iter = e_module->table;
-				while (count--) {
-					if (addr_conflict(iter, start, end)) {
-						conflict = 1;
-						goto out;
-					}
-					iter++;
-				}
-			}
-		}
-	}
-out:
-	return conflict;
-}
-
-#endif
-
-/***
- * jump_label_text_reserved - check if addr range is reserved
- * @start: start text addr
- * @end: end text addr
- *
- * checks if the text addr located between @start and @end
- * overlaps with any of the jump label patch addresses. Code
- * that wants to modify kernel text should first verify that
- * it does not overlap with any of the jump label addresses.
- * Caller must hold jump_label_mutex.
- *
- * returns 1 if there is an overlap, 0 otherwise
- */
-int jump_label_text_reserved(void *start, void *end)
+static int __jump_label_text_reserved(struct jump_entry *iter_start,
+		struct jump_entry *iter_stop, void *start, void *end)
 {
 	struct jump_entry *iter;
-	struct jump_entry *iter_start = __start___jump_table;
-	struct jump_entry *iter_stop = __start___jump_table;
-	int conflict = 0;
 
 	iter = iter_start;
 	while (iter < iter_stop) {
-		if (addr_conflict(iter, start, end)) {
-			conflict = 1;
-			goto out;
-		}
+		if (addr_conflict(iter, start, end))
+			return 1;
 		iter++;
 	}
 
-	/* now check modules */
-#ifdef CONFIG_MODULES
-	conflict = module_conflict(start, end);
-#endif
-out:
-	return conflict;
+	return 0;
+}
+
+static void __jump_label_update(struct jump_label_key *key, 
+		struct jump_entry *entry, int enable)
+{
+	for (; entry->key == (jump_label_t)key; entry++) {
+		if (WARN_ON_ONCE(!kernel_text_address(iter->code)))
+			continue;
+
+		arch_jump_label_transform(iter, enable);
+	}
 }
 
 /*
@@ -277,141 +118,155 @@ void __weak arch_jump_label_text_poke_early(jump_label_t addr)
 {
 }
 
-static __init int init_jump_label(void)
+static __init int jump_label_init(void)
 {
-	int ret;
 	struct jump_entry *iter_start = __start___jump_table;
 	struct jump_entry *iter_stop = __stop___jump_table;
+	struct jump_label_key *key = NULL;
 	struct jump_entry *iter;
 
 	jump_label_lock();
-	ret = build_jump_label_hashtable(__start___jump_table,
-					 __stop___jump_table);
-	iter = iter_start;
-	while (iter < iter_stop) {
+	jump_label_sort_entries(iter_start, iter_stop);
+
+	for (iter = iter_start; iter < iter_stop; iter++) {
 		arch_jump_label_text_poke_early(iter->code);
-		iter++;
+		if (iter->key == (jump_label_t)key)
+			continue;
+
+		key = (struct jump_label_key *)iter->key;
+		atomic_set(&key->enabled, 0);
+		key->entries = iter;
+#ifdef CONFIG_MODULES
+		key->next = NULL;
+#endif
 	}
 	jump_label_unlock();
-	return ret;
+
+	return 0;
 }
-early_initcall(init_jump_label);
+early_initcall(jump_label_init);
 
 #ifdef CONFIG_MODULES
 
-static struct jump_label_module_entry *
-add_jump_label_module_entry(struct jump_label_entry *entry,
-			    struct jump_entry *iter_begin,
-			    int count, struct module *mod)
-{
-	struct jump_label_module_entry *e;
-
-	e = kmalloc(sizeof(struct jump_label_module_entry), GFP_KERNEL);
-	if (!e)
-		return ERR_PTR(-ENOMEM);
-	e->mod = mod;
-	e->nr_entries = count;
-	e->table = iter_begin;
-	hlist_add_head(&e->hlist, &entry->modules);
-	return e;
-}
+struct jump_label_mod {
+	struct jump_label_mod *next;
+	struct jump_entry *entries;
+	struct module *mod;
+};
 
-static int add_jump_label_module(struct module *mod)
+static int __jump_label_mod_text_reserved(void *start, void *end)
 {
-	struct jump_entry *iter, *iter_begin;
-	struct jump_label_entry *entry;
-	struct jump_label_module_entry *module_entry;
-	int count;
+	struct module *mod;
 
-	/* if the module doesn't have jump label entries, just return */
-	if (!mod->num_jump_entries)
+	mod = __module_text_address(start);
+	if (!mod)
 		return 0;
 
-	sort_jump_label_entries(mod->jump_entries,
+	WARN_ON_ONCE(__module_text_address(end) != mod);
+
+	return __jump_label_text_reserved(mod->jump_entries,
 				mod->jump_entries + mod->num_jump_entries);
-	iter = mod->jump_entries;
-	while (iter < mod->jump_entries + mod->num_jump_entries) {
-		entry = get_jump_label_entry(iter->key);
-		iter_begin = iter;
-		count = 0;
-		while ((iter < mod->jump_entries + mod->num_jump_entries) &&
-			(iter->key == iter_begin->key)) {
-				iter++;
-				count++;
-		}
-		if (!entry) {
-			entry = add_jump_label_entry(iter_begin->key, 0, NULL);
-			if (IS_ERR(entry))
-				return PTR_ERR(entry);
-		}
-		module_entry = add_jump_label_module_entry(entry, iter_begin,
-							   count, mod);
-		if (IS_ERR(module_entry))
-			return PTR_ERR(module_entry);
+}
+
+static void __jump_label_mod_update(struct jump_label_key *key, int enable)
+{
+	struct jump_label_mod *mod = key->next;
+
+	while (mod) {
+		__jump_label_update(key, mod->entries, enable);
+		mod = mod->next;
 	}
-	return 0;
 }
 
-static void remove_jump_label_module(struct module *mod)
+/***
+ * apply_jump_label_nops - patch module jump labels with arch_get_jump_label_nop()
+ * @mod: module to patch
+ *
+ * Allow for run-time selection of the optimal nops. Before the module
+ * loads patch these with arch_get_jump_label_nop(), which is specified by
+ * the arch specific jump label code.
+ */
+static void jump_label_apply_nops(struct module *mod)
 {
-	struct hlist_head *head;
-	struct hlist_node *node, *node_next, *module_node, *module_node_next;
-	struct jump_label_entry *e;
-	struct jump_label_module_entry *e_module;
-	int i;
+	struct jump_entry *iter_start = mod->jump_entries;
+	struct jump_entry *iter_stop = mod->jump_entries + mod->num_jump_entries;
+	struct jump_entry *iter;
 
 	/* if the module doesn't have jump label entries, just return */
-	if (!mod->num_jump_entries)
+	if (iter_start == iter_stop)
 		return;
 
-	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
-		head = &jump_label_table[i];
-		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
-			hlist_for_each_entry_safe(e_module, module_node,
-						  module_node_next,
-						  &(e->modules), hlist) {
-				if (e_module->mod == mod) {
-					hlist_del(&e_module->hlist);
-					kfree(e_module);
-				}
-			}
-			if (hlist_empty(&e->modules) && (e->nr_entries == 0)) {
-				hlist_del(&e->hlist);
-				kfree(e);
-			}
+	jump_label_sort_entries(iter_start, iter_stop);
+
+	for (iter = iter_start; iter < iter_stop; iter++)
+		arch_jump_label_text_poke_early(iter->code);
+}
+
+static int jump_label_add_module(struct module *mod)
+{
+	struct jump_entry *iter_start = mod->jump_entries;
+	struct jump_entry *iter_stop = mod->jump_entries + mod->num_jump_entries;
+	struct jump_entry *iter;
+	struct jump_label_key *key = NULL;
+	struct jump_label_mod *jlm;
+
+	for (iter = iter_start; iter < iter_stop; iter++) {
+		if (iter->key == (jump_label_t)key)
+			continue;
+
+		key = (struct jump_label_key)iter->key;
+
+		if (__module_address(iter->key) == mod) {
+			atomic_set(&key->enabled, 0);
+			key->entries = iter;
+			key->next = NULL;
+			continue;
 		}
+
+		jlm = kzalloc(sizeof(struct jump_label_mod), GFP_KERNEL);
+		if (!jlm)
+			return -ENOMEM;
+
+		jlm->mod = mod;
+		jlm->entries = iter;
+		jlm->next = key->next;
+		key->next = jlm;
+
+		if (jump_label_enabled(key))
+			__jump_label_update(key, iter, JUMP_LABEL_ENABLE);
 	}
+
+	return 0;
 }
 
-static void remove_jump_label_module_init(struct module *mod)
+static void jump_label_del_module(struct module *mod)
 {
-	struct hlist_head *head;
-	struct hlist_node *node, *node_next, *module_node, *module_node_next;
-	struct jump_label_entry *e;
-	struct jump_label_module_entry *e_module;
+	struct jump_entry *iter_start = mod->jump_entries;
+	struct jump_entry *iter_stop = mod->jump_entries + mod->num_jump_entries;
 	struct jump_entry *iter;
-	int i, count;
+	struct jump_label_key *key = NULL;
+	struct jump_label_mod *jlm, **prev;
 
-	/* if the module doesn't have jump label entries, just return */
-	if (!mod->num_jump_entries)
-		return;
+	for (iter = iter_start; iter < iter_stop; iter++) {
+		if (iter->key == (jump_label_t)key)
+			continue;
+
+		key = (struct jump_label_key)iter->key;
+
+		if (__module_address(iter->key) == mod)
+			continue;
+
+		prev = &key->next;
+		jlm = key->next;
+
+		while (jlm && jlm->mod != mod) {
+			prev = &jlm->next;
+			jlm = jlm->next;
+		}
 
-	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
-		head = &jump_label_table[i];
-		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
-			hlist_for_each_entry_safe(e_module, module_node,
-						  module_node_next,
-						  &(e->modules), hlist) {
-				if (e_module->mod != mod)
-					continue;
-				count = e_module->nr_entries;
-				iter = e_module->table;
-				while (count--) {
-					if (within_module_init(iter->code, mod))
-						iter->key = 0;
-					iter++;
-				}
-			}
+		if (jlm) {
+			*prev = jlm->next;
+			kfree(jlm);
 		}
 	}
 }
@@ -424,61 +279,76 @@ jump_label_module_notify(struct notifier_block *self, unsigned long val,
 	int ret = 0;
 
 	switch (val) {
-	case MODULE_STATE_COMING:
+	case MODULE_STATE_POST_RELOCATE:
 		jump_label_lock();
-		ret = add_jump_label_module(mod);
-		if (ret)
-			remove_jump_label_module(mod);
+		jump_label_apply_nops(mod);
 		jump_label_unlock();
 		break;
-	case MODULE_STATE_GOING:
+	case MODULE_STATE_COMING:
 		jump_label_lock();
-		remove_jump_label_module(mod);
+		ret = jump_label_add_module(mod);
+		if (ret)
+			jump_label_del_module(mod);
 		jump_label_unlock();
 		break;
-	case MODULE_STATE_LIVE:
+	case MODULE_STATE_GOING:
 		jump_label_lock();
-		remove_jump_label_module_init(mod);
+		jump_label_del_module(mod);
 		jump_label_unlock();
 		break;
 	}
 	return ret;
 }
 
+struct notifier_block jump_label_module_nb = {
+	.notifier_call = jump_label_module_notify,
+	.priority = 1, /* higher than tracepoints */
+};
+
+static __init int jump_label_init_module(void)
+{
+	return register_module_notifier(&jump_label_module_nb);
+}
+early_initcall(jump_label_init_module);
+
+#endif /* CONFIG_MODULES */
+
 /***
- * apply_jump_label_nops - patch module jump labels with arch_get_jump_label_nop()
- * @mod: module to patch
+ * jump_label_text_reserved - check if addr range is reserved
+ * @start: start text addr
+ * @end: end text addr
  *
- * Allow for run-time selection of the optimal nops. Before the module
- * loads patch these with arch_get_jump_label_nop(), which is specified by
- * the arch specific jump label code.
+ * checks if the text addr located between @start and @end
+ * overlaps with any of the jump label patch addresses. Code
+ * that wants to modify kernel text should first verify that
+ * it does not overlap with any of the jump label addresses.
+ * Caller must hold jump_label_mutex.
+ *
+ * returns 1 if there is an overlap, 0 otherwise
  */
-void jump_label_apply_nops(struct module *mod)
+int jump_label_text_reserved(void *start, void *end)
 {
-	struct jump_entry *iter;
+	int ret = __jump_label_text_reserved(__start___jump_table,
+			__stop___jump_table, start, end);
 
-	/* if the module doesn't have jump label entries, just return */
-	if (!mod->num_jump_entries)
-		return;
+	if (ret)
+		return ret;
 
-	iter = mod->jump_entries;
-	while (iter < mod->jump_entries + mod->num_jump_entries) {
-		arch_jump_label_text_poke_early(iter->code);
-		iter++;
-	}
+#ifdef CONFIG_MODULES
+	ret = __jump_label_mod_text_reserved(start, end);
+#endif
+	return ret;
 }
 
-struct notifier_block jump_label_module_nb = {
-	.notifier_call = jump_label_module_notify,
-	.priority = 0,
-};
-
-static __init int init_jump_label_module(void)
+static void jump_label_update(struct jump_label_key *key, int enable)
 {
-	return register_module_notifier(&jump_label_module_nb);
-}
-early_initcall(init_jump_label_module);
+	struct jump_entry *entry = key->entries;
 
-#endif /* CONFIG_MODULES */
+	__jump_label_update(key, entry, enable);
+
+#ifdef CONFIG_MODULES
+	__jump_label_mod_update(key, enable);
+#endif
+}
 
 #endif
diff --git a/kernel/module.c b/kernel/module.c
index efa290e..890cadf 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2789,6 +2789,13 @@ static struct module *load_module(void __user *umod,
 		goto unlock;
 	}
 
+	err = blocking_notifier_call_chain(&module_notify_list,
+			MODULE_STATE_POST_RELOCATE, mod);
+	if (err != NOTIFY_DONE) {
+		err = notifier_to_errno(err);
+		goto unlock;
+	}
+
 	/* This has to be done once we're sure module name is unique. */
 	if (!mod->taints)
 		dynamic_debug_setup(info.debug, info.num_debug);
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index a353a4d..7bacdd3 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -117,7 +117,7 @@ enum event_type_t {
 	EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
 };
 
-atomic_t perf_task_events __read_mostly;
+struct jump_label_key_counter perf_task_events __read_mostly;
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
@@ -2383,8 +2383,10 @@ static void free_event(struct perf_event *event)
 	irq_work_sync(&event->pending);
 
 	if (!event->parent) {
-		if (event->attach_state & PERF_ATTACH_TASK)
-			jump_label_dec(&perf_task_events);
+		if (event->attach_state & PERF_ATTACH_TASK) {
+			if (atomic_dec_and_test(&perf_task_events.ref))
+				jump_label_disable(&perf_task_events.key);
+		}
 		if (event->attr.mmap || event->attr.mmap_data)
 			atomic_dec(&nr_mmap_events);
 		if (event->attr.comm)
@@ -4912,7 +4914,7 @@ fail:
 	return err;
 }
 
-atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
+struct jump_label_key_counter perf_swevent_enabled[PERF_COUNT_SW_MAX];
 
 static void sw_perf_event_destroy(struct perf_event *event)
 {
@@ -4920,7 +4922,8 @@ static void sw_perf_event_destroy(struct perf_event *event)
 
 	WARN_ON(event->parent);
 
-	jump_label_dec(&perf_swevent_enabled[event_id]);
+	if (atomic_dec_and_test(&perf_swevent_enabled[event_id].ref))
+		jump_label_disable(&perf_swevent_enabled[event_id].key);
 	swevent_hlist_put(event);
 }
 
@@ -4945,12 +4948,15 @@ static int perf_swevent_init(struct perf_event *event)
 
 	if (!event->parent) {
 		int err;
+		atomic_t *ref;
 
 		err = swevent_hlist_get(event);
 		if (err)
 			return err;
 
-		jump_label_inc(&perf_swevent_enabled[event_id]);
+		ref = &perf_swevent_enabled[event_id].ref;
+		if (atomic_add_return(1, ref) == 1)
+			jump_label_enable(&perf_swevent_enabled[event_id].key);
 		event->destroy = sw_perf_event_destroy;
 	}
 
@@ -5123,6 +5129,10 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 	u64 period;
 
 	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
+
+	if (event->state < PERF_EVENT_STATE_ACTIVE)
+		return HRTIMER_NORESTART;
+
 	event->pmu->read(event);
 
 	perf_sample_data_init(&data, 0);
@@ -5174,7 +5184,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
 		local64_set(&hwc->period_left, ktime_to_ns(remaining));
 
-		hrtimer_cancel(&hwc->hrtimer);
+		hrtimer_try_to_cancel(&hwc->hrtimer);
 	}
 }
 
@@ -5713,8 +5723,10 @@ done:
 	event->pmu = pmu;
 
 	if (!event->parent) {
-		if (event->attach_state & PERF_ATTACH_TASK)
-			jump_label_inc(&perf_task_events);
+		if (event->attach_state & PERF_ATTACH_TASK) {
+			if (atomic_add_return(1, &perf_task_events.ref) == 1)
+				jump_label_enable(&perf_task_events.key);
+		}
 		if (event->attr.mmap || event->attr.mmap_data)
 			atomic_inc(&nr_mmap_events);
 		if (event->attr.comm)
diff --git a/kernel/timer.c b/kernel/timer.c
index 343ff27..c848cd8 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -959,7 +959,7 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
  *
  * Synchronization rules: Callers must prevent restarting of the timer,
  * otherwise this function is meaningless. It must not be called from
- * hardirq contexts. The caller must not hold locks which would prevent
+ * interrupt contexts. The caller must not hold locks which would prevent
  * completion of the timer's handler. The timer's handler must not call
  * add_timer_on(). Upon exit the timer is not queued and the handler is
  * not running on any CPU.
@@ -971,12 +971,10 @@ int del_timer_sync(struct timer_list *timer)
 #ifdef CONFIG_LOCKDEP
 	unsigned long flags;
 
-	raw_local_irq_save(flags);
-	local_bh_disable();
+	local_irq_save(flags);
 	lock_map_acquire(&timer->lockdep_map);
 	lock_map_release(&timer->lockdep_map);
-	_local_bh_enable();
-	raw_local_irq_restore(flags);
+	local_irq_restore(flags);
 #endif
 	/*
 	 * don't use it in hardirq context, because it
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 68187af..13066e8 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -251,9 +251,9 @@ static void set_tracepoint(struct tracepoint_entry **entry,
 {
 	WARN_ON(strcmp((*entry)->name, elem->name) != 0);
 
-	if (elem->regfunc && !elem->state && active)
+	if (elem->regfunc && !jump_label_enabled(&elem->key) && active)
 		elem->regfunc();
-	else if (elem->unregfunc && elem->state && !active)
+	else if (elem->unregfunc && jump_label_enabled(&elem->key) && !active)
 		elem->unregfunc();
 
 	/*
@@ -264,13 +264,10 @@ static void set_tracepoint(struct tracepoint_entry **entry,
 	 * is used.
 	 */
 	rcu_assign_pointer(elem->funcs, (*entry)->funcs);
-	if (!elem->state && active) {
-		jump_label_enable(&elem->state);
-		elem->state = active;
-	} else if (elem->state && !active) {
-		jump_label_disable(&elem->state);
-		elem->state = active;
-	}
+	if (active)
+		jump_label_enable(&elem->key);
+	else if (!active)
+		jump_label_disable(&elem->key);
 }
 
 /*
@@ -281,13 +278,10 @@ static void set_tracepoint(struct tracepoint_entry **entry,
  */
 static void disable_tracepoint(struct tracepoint *elem)
 {
-	if (elem->unregfunc && elem->state)
+	if (elem->unregfunc && jump_label_enabled(&elem->key))
 		elem->unregfunc();
 
-	if (elem->state) {
-		jump_label_disable(&elem->state);
-		elem->state = 0;
-	}
+	jump_label_disable(&elem->key);
 	rcu_assign_pointer(elem->funcs, NULL);
 }
 



^ permalink raw reply related	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-12 18:47       ` Peter Zijlstra
@ 2011-02-14 12:27         ` Ingo Molnar
  2011-02-14 15:51         ` Jason Baron
  2011-02-14 16:11         ` Mathieu Desnoyers
  2 siblings, 0 replies; 113+ messages in thread
From: Ingo Molnar @ 2011-02-14 12:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, Jason Baron, hpa, rostedt, tglx, andi, roland,
	rth, masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney,
	michael, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2011-02-11 at 22:38 +0100, Peter Zijlstra wrote:
> > 
> > So why can't we make that jump_label_entry::refcount and
> > jump_label_key::state an atomic_t and be done with it? 
> 
> So I had a bit of a poke at this because I didn't quite understand why
> all that stuff was as it was. I applied both Jason's patches and then
> basically rewrote kernel/jump_label.c just for kicks ;-)
> 
> I haven't tried compiling this, let alone running it, but provided I
> didn't actually forget anything the storage per key is now 16 bytes when
> modules are disabled and 24 * (1 + mods) bytes for when they are
> enabled. The old code had 64 + 40 * mods bytes.
> 
> I still need to clean up the static_branch_else bits and look at !x86
> aside from the already mentioned bits.. but what do people think?

[...]

>  15 files changed, 333 insertions(+), 456 deletions(-)

The diffstat win alone makes me want this :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-12 18:47       ` Peter Zijlstra
  2011-02-14 12:27         ` Ingo Molnar
@ 2011-02-14 15:51         ` Jason Baron
  2011-02-14 15:57           ` Peter Zijlstra
  2011-02-14 16:11         ` Mathieu Desnoyers
  2 siblings, 1 reply; 113+ messages in thread
From: Jason Baron @ 2011-02-14 15:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Sat, Feb 12, 2011 at 07:47:45PM +0100, Peter Zijlstra wrote:
> On Fri, 2011-02-11 at 22:38 +0100, Peter Zijlstra wrote:
> > 
> > So why can't we make that jump_label_entry::refcount and
> > jump_label_key::state an atomic_t and be done with it? 
> 
> So I had a bit of a poke at this because I didn't quite understand why
> all that stuff was as it was. I applied both Jason's patches and then
> basically rewrote kernel/jump_label.c just for kicks ;-)
> 
> I haven't tried compiling this, let alone running it, but provided I
> didn't actually forget anything the storage per key is now 16 bytes when
> modules are disabled and 24 * (1 + mods) bytes for when they are
> enabled. The old code had 64 + 40 * mods bytes.
> 
> I still need to clean up the static_branch_else bits and look at !x86
> aside from the already mentioned bits.. but what do people think?
> 
> ---

Generally, I really like this! Its the direction I think the jump label
code should be going. The complete removal of the hash table, makes the
design a lot better and simpler. We just need to get some of the details
cleaned up, and of course we need this to compile :) But I don't see any
fundamental problems with this approach. 

Things that still need to be sorted out:

1) Since jump_label.h, are included in kernel.h, (indirectly via the
dynamic_debug.h) the atomic_t definitions could be problematic, since
atomic.h includes kernel.h indirectly...so we might need some header
magic.

2) I had some code to disallow writing to module __init section, by
setting the 'key' value to 0, after the module->init was run, but
before, the memory was freed. And then I check for a non-zero key value
when the jump label is updated. In this way we can't corrupt some random
piece of memory. I had this done via the 'MODULE_STATE_LIVE' notifier.

3) For 'jump_label_enable()' 'jump_label_disable()' in the tracepoint
code, I'm not sure that there is an enable for each disable. So i'm not
sure if a refcount would work there. But we can fix this by first
checking 'jump_label_enabled()' before calling 'jump_label_eanble()' or
jump_label_ref(). This is safe b/c the the tracepoint code is protected
using the tracepoint_mutex.

thanks,

-Jason 


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 15:51         ` Jason Baron
@ 2011-02-14 15:57           ` Peter Zijlstra
  2011-02-14 16:04             ` Jason Baron
  0 siblings, 1 reply; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-14 15:57 UTC (permalink / raw)
  To: Jason Baron
  Cc: Mathieu Desnoyers, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Mon, 2011-02-14 at 10:51 -0500, Jason Baron wrote:
> On Sat, Feb 12, 2011 at 07:47:45PM +0100, Peter Zijlstra wrote:
> > On Fri, 2011-02-11 at 22:38 +0100, Peter Zijlstra wrote:
> > > 
> > > So why can't we make that jump_label_entry::refcount and
> > > jump_label_key::state an atomic_t and be done with it? 
> > 
> > So I had a bit of a poke at this because I didn't quite understand why
> > all that stuff was as it was. I applied both Jason's patches and then
> > basically rewrote kernel/jump_label.c just for kicks ;-)
> > 
> > I haven't tried compiling this, let alone running it, but provided I
> > didn't actually forget anything the storage per key is now 16 bytes when
> > modules are disabled and 24 * (1 + mods) bytes for when they are
> > enabled. The old code had 64 + 40 * mods bytes.
> > 
> > I still need to clean up the static_branch_else bits and look at !x86
> > aside from the already mentioned bits.. but what do people think?
> > 
> > ---
> 
> Generally, I really like this! Its the direction I think the jump label
> code should be going. The complete removal of the hash table, makes the
> design a lot better and simpler. We just need to get some of the details
> cleaned up, and of course we need this to compile :) But I don't see any
> fundamental problems with this approach. 
> 
> Things that still need to be sorted out:
> 
> 1) Since jump_label.h, are included in kernel.h, (indirectly via the
> dynamic_debug.h) the atomic_t definitions could be problematic, since
> atomic.h includes kernel.h indirectly...so we might need some header
> magic.

Yes, I remember running into that when I did the jump_label_ref stuff,
some head-scratching is in order there.

> 2) I had some code to disallow writing to module __init section, by
> setting the 'key' value to 0, after the module->init was run, but
> before, the memory was freed. And then I check for a non-zero key value
> when the jump label is updated. In this way we can't corrupt some random
> piece of memory. I had this done via the 'MODULE_STATE_LIVE' notifier.

AH! I wondered what that was about.. that wouldn't work now since we
actually rely on iter->key to remain what it was.

> 3) For 'jump_label_enable()' 'jump_label_disable()' in the tracepoint
> code, I'm not sure that there is an enable for each disable. So i'm not
> sure if a refcount would work there. But we can fix this by first
> checking 'jump_label_enabled()' before calling 'jump_label_eanble()' or
> jump_label_ref(). This is safe b/c the the tracepoint code is protected
> using the tracepoint_mutex.

Right,.. I hadn't considered people using it like that, but like you
said, that should be easily fixed.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 15:57           ` Peter Zijlstra
@ 2011-02-14 16:04             ` Jason Baron
  2011-02-14 16:14               ` Mathieu Desnoyers
       [not found]               ` <BLU0-SMTP4069A1A89F06CDFF9B28F896D00@phx.gbl>
  0 siblings, 2 replies; 113+ messages in thread
From: Jason Baron @ 2011-02-14 16:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Mon, Feb 14, 2011 at 04:57:04PM +0100, Peter Zijlstra wrote:
> On Mon, 2011-02-14 at 10:51 -0500, Jason Baron wrote:
> > On Sat, Feb 12, 2011 at 07:47:45PM +0100, Peter Zijlstra wrote:
> > > On Fri, 2011-02-11 at 22:38 +0100, Peter Zijlstra wrote:
> > > > 
> > > > So why can't we make that jump_label_entry::refcount and
> > > > jump_label_key::state an atomic_t and be done with it? 
> > > 
> > > So I had a bit of a poke at this because I didn't quite understand why
> > > all that stuff was as it was. I applied both Jason's patches and then
> > > basically rewrote kernel/jump_label.c just for kicks ;-)
> > > 
> > > I haven't tried compiling this, let alone running it, but provided I
> > > didn't actually forget anything the storage per key is now 16 bytes when
> > > modules are disabled and 24 * (1 + mods) bytes for when they are
> > > enabled. The old code had 64 + 40 * mods bytes.
> > > 
> > > I still need to clean up the static_branch_else bits and look at !x86
> > > aside from the already mentioned bits.. but what do people think?
> > > 
> > > ---
> > 
> > Generally, I really like this! Its the direction I think the jump label
> > code should be going. The complete removal of the hash table, makes the
> > design a lot better and simpler. We just need to get some of the details
> > cleaned up, and of course we need this to compile :) But I don't see any
> > fundamental problems with this approach. 
> > 
> > Things that still need to be sorted out:
> > 
> > 1) Since jump_label.h, are included in kernel.h, (indirectly via the
> > dynamic_debug.h) the atomic_t definitions could be problematic, since
> > atomic.h includes kernel.h indirectly...so we might need some header
> > magic.
> 
> Yes, I remember running into that when I did the jump_label_ref stuff,
> some head-scratching is in order there.
> 

yes. i suspect this might be the hardest bit of this...

> > 2) I had some code to disallow writing to module __init section, by
> > setting the 'key' value to 0, after the module->init was run, but
> > before, the memory was freed. And then I check for a non-zero key value
> > when the jump label is updated. In this way we can't corrupt some random
> > piece of memory. I had this done via the 'MODULE_STATE_LIVE' notifier.
> 
> AH! I wondered what that was about.. that wouldn't work now since we
> actually rely on iter->key to remain what it was.
> 

we could just use iter->code, or iter->target -> 0 to indicate that the
entry is not valid, and leave iter->key as it is.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-12 18:47       ` Peter Zijlstra
  2011-02-14 12:27         ` Ingo Molnar
  2011-02-14 15:51         ` Jason Baron
@ 2011-02-14 16:11         ` Mathieu Desnoyers
  2 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-14 16:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Baron, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Fri, 2011-02-11 at 22:38 +0100, Peter Zijlstra wrote:
> > 
> > So why can't we make that jump_label_entry::refcount and
> > jump_label_key::state an atomic_t and be done with it? 
> 
> So I had a bit of a poke at this because I didn't quite understand why
> all that stuff was as it was. I applied both Jason's patches and then
> basically rewrote kernel/jump_label.c just for kicks ;-)
> 
> I haven't tried compiling this, let alone running it, but provided I
> didn't actually forget anything the storage per key is now 16 bytes when
> modules are disabled and 24 * (1 + mods) bytes for when they are
> enabled. The old code had 64 + 40 * mods bytes.
> 
> I still need to clean up the static_branch_else bits and look at !x86
> aside from the already mentioned bits.. but what do people think?

Hi Peter,

It looks like a huge step in the right direction. I'm sure that once
Jason and you finish ironing out the details, this will be a huge
improvement in terms of shrinking code and API complexity.

Thanks,

Mathieu

> 
> ---
>  arch/sparc/include/asm/jump_label.h |   25 +-
>  arch/x86/include/asm/jump_label.h   |   22 +-
>  arch/x86/kernel/jump_label.c        |    2 +-
>  arch/x86/kernel/module.c            |    3 -
>  include/linux/dynamic_debug.h       |   10 +-
>  include/linux/jump_label.h          |   71 +++---
>  include/linux/jump_label_ref.h      |   36 +--
>  include/linux/module.h              |    1 +
>  include/linux/perf_event.h          |   28 +-
>  include/linux/tracepoint.h          |    8 +-
>  kernel/jump_label.c                 |  516 +++++++++++++----------------------
>  kernel/module.c                     |    7 +
>  kernel/perf_event.c                 |   30 ++-
>  kernel/timer.c                      |    8 +-
>  kernel/tracepoint.c                 |   22 +-
>  15 files changed, 333 insertions(+), 456 deletions(-)
> 
> diff --git a/arch/sparc/include/asm/jump_label.h b/arch/sparc/include/asm/jump_label.h
> index 427d468..e4ca085 100644
> --- a/arch/sparc/include/asm/jump_label.h
> +++ b/arch/sparc/include/asm/jump_label.h
> @@ -7,17 +7,20 @@
>  
>  #define JUMP_LABEL_NOP_SIZE 4
>  
> -#define JUMP_LABEL(key, label)					\
> -	do {							\
> -		asm goto("1:\n\t"				\
> -			 "nop\n\t"				\
> -			 "nop\n\t"				\
> -			 ".pushsection __jump_table,  \"a\"\n\t"\
> -			 ".align 4\n\t"				\
> -			 ".word 1b, %l[" #label "], %c0\n\t"	\
> -			 ".popsection \n\t"			\
> -			 : :  "i" (key) :  : label);\
> -	} while (0)
> +static __always_inline bool __static_branch(struct jump_label_key *key)
> +{
> +		asm goto("1:\n\t"
> +			 "nop\n\t"
> +			 "nop\n\t"
> +			 ".pushsection __jump_table,  \"a\"\n\t"
> +			 ".align 4\n\t"
> +			 ".word 1b, %l[l_yes], %c0\n\t"
> +			 ".popsection \n\t"
> +			 : :  "i" (key) :  : l_yes);
> +	return false;
> +l_yes:
> +	return true;
> +}
>  
>  #endif /* __KERNEL__ */
>  
> diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
> index 574dbc2..3d44a7c 100644
> --- a/arch/x86/include/asm/jump_label.h
> +++ b/arch/x86/include/asm/jump_label.h
> @@ -5,20 +5,24 @@
>  
>  #include <linux/types.h>
>  #include <asm/nops.h>
> +#include <asm/asm.h>
>  
>  #define JUMP_LABEL_NOP_SIZE 5
>  
>  # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
>  
> -# define JUMP_LABEL(key, label)					\
> -	do {							\
> -		asm goto("1:"					\
> -			JUMP_LABEL_INITIAL_NOP			\
> -			".pushsection __jump_table,  \"aw\" \n\t"\
> -			_ASM_PTR "1b, %l[" #label "], %c0 \n\t" \
> -			".popsection \n\t"			\
> -			: :  "i" (key) :  : label);		\
> -	} while (0)
> +static __always_inline bool __static_branch(struct jump_label_key *key)
> +{
> +	asm goto("1:"
> +		JUMP_LABEL_INITIAL_NOP
> +		".pushsection __jump_table,  \"a\" \n\t"
> +		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> +		".popsection \n\t"
> +		: :  "i" (key) : : l_yes );
> +	return false;
> +l_yes:
> +	return true;
> +}
>  
>  #endif /* __KERNEL__ */
>  
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index 961b6b3..dfa4c3c 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -4,13 +4,13 @@
>   * Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
>   *
>   */
> -#include <linux/jump_label.h>
>  #include <linux/memory.h>
>  #include <linux/uaccess.h>
>  #include <linux/module.h>
>  #include <linux/list.h>
>  #include <linux/jhash.h>
>  #include <linux/cpu.h>
> +#include <linux/jump_label.h>
>  #include <asm/kprobes.h>
>  #include <asm/alternative.h>
>  
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index ab23f1a..0e6b823 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -230,9 +230,6 @@ int module_finalize(const Elf_Ehdr *hdr,
>  		apply_paravirt(pseg, pseg + para->sh_size);
>  	}
>  
> -	/* make jump label nops */
> -	jump_label_apply_nops(me);
> -
>  	return 0;
>  }
>  
> diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
> index 1c70028..2ade291 100644
> --- a/include/linux/dynamic_debug.h
> +++ b/include/linux/dynamic_debug.h
> @@ -33,7 +33,7 @@ struct _ddebug {
>  #define _DPRINTK_FLAGS_PRINT   (1<<0)  /* printk() a message using the format */
>  #define _DPRINTK_FLAGS_DEFAULT 0
>  	unsigned int flags:8;
> -	char enabled;
> +	struct jump_label_key enabled;
>  } __attribute__((aligned(8)));
>  
>  
> @@ -48,8 +48,8 @@ extern int ddebug_remove_module(const char *mod_name);
>  	__used								\
>  	__attribute__((section("__verbose"), aligned(8))) =		\
>  	{ KBUILD_MODNAME, __func__, __FILE__, fmt, __LINE__,		\
> -		_DPRINTK_FLAGS_DEFAULT };				\
> -	if (unlikely(descriptor.enabled))				\
> +		_DPRINTK_FLAGS_DEFAULT, JUMP_LABEL_INIT };		\
> +	if (static_branch(&descriptor.enabled))				\
>  		printk(KERN_DEBUG pr_fmt(fmt),	##__VA_ARGS__);		\
>  	} while (0)
>  
> @@ -59,8 +59,8 @@ extern int ddebug_remove_module(const char *mod_name);
>  	__used								\
>  	__attribute__((section("__verbose"), aligned(8))) =		\
>  	{ KBUILD_MODNAME, __func__, __FILE__, fmt, __LINE__,		\
> -		_DPRINTK_FLAGS_DEFAULT };				\
> -	if (unlikely(descriptor.enabled))				\
> +		_DPRINTK_FLAGS_DEFAULT, JUMP_LABEL_INIT };		\
> +	if (static_branch(&descriptor.enabled))				\
>  		dev_printk(KERN_DEBUG, dev, fmt, ##__VA_ARGS__);	\
>  	} while (0)
>  
> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> index 7880f18..a1cec0a 100644
> --- a/include/linux/jump_label.h
> +++ b/include/linux/jump_label.h
> @@ -2,19 +2,35 @@
>  #define _LINUX_JUMP_LABEL_H
>  
>  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
> +
> +struct jump_label_key {
> +	atomic_t enabled;
> +	struct jump_entry *entries;
> +#ifdef CONFIG_MODULES
> +	struct jump_module *next;
> +#endif
> +};
> +
>  # include <asm/jump_label.h>
>  # define HAVE_JUMP_LABEL
>  #endif
>  
>  enum jump_label_type {
> +	JUMP_LABEL_DISABLE = 0,
>  	JUMP_LABEL_ENABLE,
> -	JUMP_LABEL_DISABLE
>  };
>  
>  struct module;
>  
> +#define JUMP_LABEL_INIT { 0 }
> +
>  #ifdef HAVE_JUMP_LABEL
>  
> +static __always_inline bool static_branch(struct jump_label_key *key)
> +{
> +	return __static_branch(key);
> +}
> +
>  extern struct jump_entry __start___jump_table[];
>  extern struct jump_entry __stop___jump_table[];
>  
> @@ -23,37 +39,31 @@ extern void jump_label_unlock(void);
>  extern void arch_jump_label_transform(struct jump_entry *entry,
>  				 enum jump_label_type type);
>  extern void arch_jump_label_text_poke_early(jump_label_t addr);
> -extern void jump_label_update(unsigned long key, enum jump_label_type type);
> -extern void jump_label_apply_nops(struct module *mod);
>  extern int jump_label_text_reserved(void *start, void *end);
> -
> -#define jump_label_enable(key) \
> -	jump_label_update((unsigned long)key, JUMP_LABEL_ENABLE);
> -
> -#define jump_label_disable(key) \
> -	jump_label_update((unsigned long)key, JUMP_LABEL_DISABLE);
> +extern void jump_label_enable(struct jump_label_key *key);
> +extern void jump_label_disable(struct jump_label_key *key);
>  
>  #else
>  
> -#define JUMP_LABEL(key, label)			\
> -do {						\
> -	if (unlikely(*key))			\
> -		goto label;			\
> -} while (0)
> +struct jump_label_key {
> +	atomic_t enabled;
> +};
>  
> -#define jump_label_enable(cond_var)	\
> -do {					\
> -       *(cond_var) = 1;			\
> -} while (0)
> +static __always_inline bool static_branch(struct jump_label_key *key)
> +{
> +	if (unlikely(atomic_read(&key->state)))
> +		return true;
> +	return false;
> +}
>  
> -#define jump_label_disable(cond_var)	\
> -do {					\
> -       *(cond_var) = 0;			\
> -} while (0)
> +static inline void jump_label_enable(struct jump_label_key *key)
> +{
> +	atomic_inc(&key->state);
> +}
>  
> -static inline int jump_label_apply_nops(struct module *mod)
> +static inline void jump_label_disable(struct jump_label_key *key)
>  {
> -	return 0;
> +	atomic_dec(&key->state);
>  }
>  
>  static inline int jump_label_text_reserved(void *start, void *end)
> @@ -66,14 +76,9 @@ static inline void jump_label_unlock(void) {}
>  
>  #endif
>  
> -#define COND_STMT(key, stmt)					\
> -do {								\
> -	__label__ jl_enabled;					\
> -	JUMP_LABEL(key, jl_enabled);				\
> -	if (0) {						\
> -jl_enabled:							\
> -		stmt;						\
> -	}							\
> -} while (0)
> +static inline bool jump_label_enabled(struct jump_label_key *key)
> +{
> +	return !!atomic_read(&key->state);
> +}
>  
>  #endif
> diff --git a/include/linux/jump_label_ref.h b/include/linux/jump_label_ref.h
> index e5d012a..5178696 100644
> --- a/include/linux/jump_label_ref.h
> +++ b/include/linux/jump_label_ref.h
> @@ -4,41 +4,27 @@
>  #include <linux/jump_label.h>
>  #include <asm/atomic.h>
>  
> -#ifdef HAVE_JUMP_LABEL
> +struct jump_label_key_counter {
> +	atomic_t ref;
> +	struct jump_label_key key;
> +};
>  
> -static inline void jump_label_inc(atomic_t *key)
> -{
> -	if (atomic_add_return(1, key) == 1)
> -		jump_label_enable(key);
> -}
> +#ifdef HAVE_JUMP_LABEL
>  
> -static inline void jump_label_dec(atomic_t *key)
> +static __always_inline bool static_branch_else_atomic_read(struct jump_label_key *key, atomic_t *count)
>  {
> -	if (atomic_dec_and_test(key))
> -		jump_label_disable(key);
> +	return __static_branch(key);
>  }
>  
>  #else /* !HAVE_JUMP_LABEL */
>  
> -static inline void jump_label_inc(atomic_t *key)
> +static __always_inline bool static_branch_else_atomic_read(struct jump_label_key *key, atomic_t *count)
>  {
> -	atomic_inc(key);
> +	if (unlikely(atomic_read(count)))
> +		return true;
> +	return false;
>  }
>  
> -static inline void jump_label_dec(atomic_t *key)
> -{
> -	atomic_dec(key);
> -}
> -
> -#undef JUMP_LABEL
> -#define JUMP_LABEL(key, label)						\
> -do {									\
> -	if (unlikely(__builtin_choose_expr(				\
> -	      __builtin_types_compatible_p(typeof(key), atomic_t *),	\
> -	      atomic_read((atomic_t *)(key)), *(key))))			\
> -		goto label;						\
> -} while (0)
> -
>  #endif /* HAVE_JUMP_LABEL */
>  
>  #endif /* _LINUX_JUMP_LABEL_REF_H */
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 9bdf27c..eeb3e99 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -266,6 +266,7 @@ enum module_state
>  	MODULE_STATE_LIVE,
>  	MODULE_STATE_COMING,
>  	MODULE_STATE_GOING,
> +	MODULE_STATE_POST_RELOCATE,
>  };
>  
>  struct module
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index dda5b0a..26fe115 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1000,7 +1000,7 @@ static inline int is_software_event(struct perf_event *event)
>  	return event->pmu->task_ctx_nr == perf_sw_context;
>  }
>  
> -extern atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
> +extern struct jump_label_key_counter perf_swevent_enabled[PERF_COUNT_SW_MAX];
>  
>  extern void __perf_sw_event(u32, u64, int, struct pt_regs *, u64);
>  
> @@ -1029,30 +1029,32 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
>  {
>  	struct pt_regs hot_regs;
>  
> -	JUMP_LABEL(&perf_swevent_enabled[event_id], have_event);
> -	return;
> -
> -have_event:
> -	if (!regs) {
> -		perf_fetch_caller_regs(&hot_regs);
> -		regs = &hot_regs;
> +	if (static_branch_else_atomic_read(&perf_swevent_enabled[event_id].key,
> +					 &perf_swevent_enabled[event_id].ref)) {
> +		if (!regs) {
> +			perf_fetch_caller_regs(&hot_regs);
> +			regs = &hot_regs;
> +		}
> +		__perf_sw_event(event_id, nr, nmi, regs, addr);
>  	}
> -	__perf_sw_event(event_id, nr, nmi, regs, addr);
>  }
>  
> -extern atomic_t perf_task_events;
> +extern struct jump_label_key_counter perf_task_events;
>  
>  static inline void perf_event_task_sched_in(struct task_struct *task)
>  {
> -	COND_STMT(&perf_task_events, __perf_event_task_sched_in(task));
> +	if (static_branch_else_atomic_read(&perf_task_events.key,
> +					   &perf_task_events.ref))
> +		__perf_event_task_sched_in(task);
>  }
>  
>  static inline
>  void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
>  {
>  	perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
> -
> -	COND_STMT(&perf_task_events, __perf_event_task_sched_out(task, next));
> +	if (static_branch_else_atomic_read(&perf_task_events.key,
> +					   &perf_task_events.ref))
> +		__perf_event_task_sched_out(task, next);
>  }
>  
>  extern void perf_event_mmap(struct vm_area_struct *vma);
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 97c84a5..6c8c747 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -29,7 +29,7 @@ struct tracepoint_func {
>  
>  struct tracepoint {
>  	const char *name;		/* Tracepoint name */
> -	int state;			/* State. */
> +	struct jump_label_key key;
>  	void (*regfunc)(void);
>  	void (*unregfunc)(void);
>  	struct tracepoint_func __rcu *funcs;
> @@ -146,9 +146,7 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
>  	extern struct tracepoint __tracepoint_##name;			\
>  	static inline void trace_##name(proto)				\
>  	{								\
> -		JUMP_LABEL(&__tracepoint_##name.state, do_trace);	\
> -		return;							\
> -do_trace:								\
> +		if (static_branch(&__tracepoint_##name.key))		\
>  			__DO_TRACE(&__tracepoint_##name,		\
>  				TP_PROTO(data_proto),			\
>  				TP_ARGS(data_args),			\
> @@ -181,7 +179,7 @@ do_trace:								\
>  	__attribute__((section("__tracepoints_strings"))) = #name;	\
>  	struct tracepoint __tracepoint_##name				\
>  	__attribute__((section("__tracepoints"))) =			\
> -		{ __tpstrtab_##name, 0, reg, unreg, NULL };		\
> +		{ __tpstrtab_##name, JUMP_LABEL_INIT, reg, unreg, NULL };\
>  	static struct tracepoint * const __tracepoint_ptr_##name __used	\
>  	__attribute__((section("__tracepoints_ptrs"))) =		\
>  		&__tracepoint_##name;
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index 3b79bd9..29b34be 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -2,9 +2,9 @@
>   * jump label support
>   *
>   * Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
> + * Copyright (C) 2011 Peter Zijlstra <pzijlstr@redhat.com>
>   *
>   */
> -#include <linux/jump_label.h>
>  #include <linux/memory.h>
>  #include <linux/uaccess.h>
>  #include <linux/module.h>
> @@ -13,32 +13,13 @@
>  #include <linux/slab.h>
>  #include <linux/sort.h>
>  #include <linux/err.h>
> +#include <linux/jump_label.h>
>  
>  #ifdef HAVE_JUMP_LABEL
>  
> -#define JUMP_LABEL_HASH_BITS 6
> -#define JUMP_LABEL_TABLE_SIZE (1 << JUMP_LABEL_HASH_BITS)
> -static struct hlist_head jump_label_table[JUMP_LABEL_TABLE_SIZE];
> -
>  /* mutex to protect coming/going of the the jump_label table */
>  static DEFINE_MUTEX(jump_label_mutex);
>  
> -struct jump_label_entry {
> -	struct hlist_node hlist;
> -	struct jump_entry *table;
> -	int nr_entries;
> -	/* hang modules off here */
> -	struct hlist_head modules;
> -	unsigned long key;
> -};
> -
> -struct jump_label_module_entry {
> -	struct hlist_node hlist;
> -	struct jump_entry *table;
> -	int nr_entries;
> -	struct module *mod;
> -};
> -
>  void jump_label_lock(void)
>  {
>  	mutex_lock(&jump_label_mutex);
> @@ -64,7 +45,7 @@ static int jump_label_cmp(const void *a, const void *b)
>  }
>  
>  static void
> -sort_jump_label_entries(struct jump_entry *start, struct jump_entry *stop)
> +jump_label_sort_entries(struct jump_entry *start, struct jump_entry *stop)
>  {
>  	unsigned long size;
>  
> @@ -73,118 +54,25 @@ sort_jump_label_entries(struct jump_entry *start, struct jump_entry *stop)
>  	sort(start, size, sizeof(struct jump_entry), jump_label_cmp, NULL);
>  }
>  
> -static struct jump_label_entry *get_jump_label_entry(jump_label_t key)
> -{
> -	struct hlist_head *head;
> -	struct hlist_node *node;
> -	struct jump_label_entry *e;
> -	u32 hash = jhash((void *)&key, sizeof(jump_label_t), 0);
> -
> -	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
> -	hlist_for_each_entry(e, node, head, hlist) {
> -		if (key == e->key)
> -			return e;
> -	}
> -	return NULL;
> -}
> +static void jump_label_update(struct jump_label_key *key, int enable);
>  
> -static struct jump_label_entry *
> -add_jump_label_entry(jump_label_t key, int nr_entries, struct jump_entry *table)
> +void jump_label_enable(struct jump_label_key *key)
>  {
> -	struct hlist_head *head;
> -	struct jump_label_entry *e;
> -	u32 hash;
> -
> -	e = get_jump_label_entry(key);
> -	if (e)
> -		return ERR_PTR(-EEXIST);
> -
> -	e = kmalloc(sizeof(struct jump_label_entry), GFP_KERNEL);
> -	if (!e)
> -		return ERR_PTR(-ENOMEM);
> -
> -	hash = jhash((void *)&key, sizeof(jump_label_t), 0);
> -	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
> -	e->key = key;
> -	e->table = table;
> -	e->nr_entries = nr_entries;
> -	INIT_HLIST_HEAD(&(e->modules));
> -	hlist_add_head(&e->hlist, head);
> -	return e;
> -}
> +	if (atomic_inc_not_zero(&key->enabled))
> +		return;
>  
> -static int
> -build_jump_label_hashtable(struct jump_entry *start, struct jump_entry *stop)
> -{
> -	struct jump_entry *iter, *iter_begin;
> -	struct jump_label_entry *entry;
> -	int count;
> -
> -	sort_jump_label_entries(start, stop);
> -	iter = start;
> -	while (iter < stop) {
> -		entry = get_jump_label_entry(iter->key);
> -		if (!entry) {
> -			iter_begin = iter;
> -			count = 0;
> -			while ((iter < stop) &&
> -				(iter->key == iter_begin->key)) {
> -				iter++;
> -				count++;
> -			}
> -			entry = add_jump_label_entry(iter_begin->key,
> -							count, iter_begin);
> -			if (IS_ERR(entry))
> -				return PTR_ERR(entry);
> -		 } else {
> -			WARN_ONCE(1, KERN_ERR "build_jump_hashtable: unexpected entry!\n");
> -			return -1;
> -		}
> -	}
> -	return 0;
> +	jump_label_lock();
> +	if (atomic_add_return(&key->enabled) == 1)
> +		jump_label_update(key, JUMP_LABEL_ENABLE);
> +	jump_label_unlock();
>  }
>  
> -/***
> - * jump_label_update - update jump label text
> - * @key -  key value associated with a a jump label
> - * @type - enum set to JUMP_LABEL_ENABLE or JUMP_LABEL_DISABLE
> - *
> - * Will enable/disable the jump for jump label @key, depending on the
> - * value of @type.
> - *
> - */
> -
> -void jump_label_update(unsigned long key, enum jump_label_type type)
> +void jump_label_disable(struct jump_label_key *key)
>  {
> -	struct jump_entry *iter;
> -	struct jump_label_entry *entry;
> -	struct hlist_node *module_node;
> -	struct jump_label_module_entry *e_module;
> -	int count;
> +	if (!atomic_dec_and_mutex_lock(&key->enabled, &jump_label_mutex))
> +		return;
>  
> -	jump_label_lock();
> -	entry = get_jump_label_entry((jump_label_t)key);
> -	if (entry) {
> -		count = entry->nr_entries;
> -		iter = entry->table;
> -		while (count--) {
> -			if (kernel_text_address(iter->code))
> -				arch_jump_label_transform(iter, type);
> -			iter++;
> -		}
> -		/* eanble/disable jump labels in modules */
> -		hlist_for_each_entry(e_module, module_node, &(entry->modules),
> -							hlist) {
> -			count = e_module->nr_entries;
> -			iter = e_module->table;
> -			while (count--) {
> -				if (iter->key &&
> -						kernel_text_address(iter->code))
> -					arch_jump_label_transform(iter, type);
> -				iter++;
> -			}
> -		}
> -	}
> +	jump_label_update(key, JUMP_LABEL_DISABLE);
>  	jump_label_unlock();
>  }
>  
> @@ -197,77 +85,30 @@ static int addr_conflict(struct jump_entry *entry, void *start, void *end)
>  	return 0;
>  }
>  
> -#ifdef CONFIG_MODULES
> -
> -static int module_conflict(void *start, void *end)
> -{
> -	struct hlist_head *head;
> -	struct hlist_node *node, *node_next, *module_node, *module_node_next;
> -	struct jump_label_entry *e;
> -	struct jump_label_module_entry *e_module;
> -	struct jump_entry *iter;
> -	int i, count;
> -	int conflict = 0;
> -
> -	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
> -		head = &jump_label_table[i];
> -		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
> -			hlist_for_each_entry_safe(e_module, module_node,
> -							module_node_next,
> -							&(e->modules), hlist) {
> -				count = e_module->nr_entries;
> -				iter = e_module->table;
> -				while (count--) {
> -					if (addr_conflict(iter, start, end)) {
> -						conflict = 1;
> -						goto out;
> -					}
> -					iter++;
> -				}
> -			}
> -		}
> -	}
> -out:
> -	return conflict;
> -}
> -
> -#endif
> -
> -/***
> - * jump_label_text_reserved - check if addr range is reserved
> - * @start: start text addr
> - * @end: end text addr
> - *
> - * checks if the text addr located between @start and @end
> - * overlaps with any of the jump label patch addresses. Code
> - * that wants to modify kernel text should first verify that
> - * it does not overlap with any of the jump label addresses.
> - * Caller must hold jump_label_mutex.
> - *
> - * returns 1 if there is an overlap, 0 otherwise
> - */
> -int jump_label_text_reserved(void *start, void *end)
> +static int __jump_label_text_reserved(struct jump_entry *iter_start,
> +		struct jump_entry *iter_stop, void *start, void *end)
>  {
>  	struct jump_entry *iter;
> -	struct jump_entry *iter_start = __start___jump_table;
> -	struct jump_entry *iter_stop = __start___jump_table;
> -	int conflict = 0;
>  
>  	iter = iter_start;
>  	while (iter < iter_stop) {
> -		if (addr_conflict(iter, start, end)) {
> -			conflict = 1;
> -			goto out;
> -		}
> +		if (addr_conflict(iter, start, end))
> +			return 1;
>  		iter++;
>  	}
>  
> -	/* now check modules */
> -#ifdef CONFIG_MODULES
> -	conflict = module_conflict(start, end);
> -#endif
> -out:
> -	return conflict;
> +	return 0;
> +}
> +
> +static void __jump_label_update(struct jump_label_key *key, 
> +		struct jump_entry *entry, int enable)
> +{
> +	for (; entry->key == (jump_label_t)key; entry++) {
> +		if (WARN_ON_ONCE(!kernel_text_address(iter->code)))
> +			continue;
> +
> +		arch_jump_label_transform(iter, enable);
> +	}
>  }
>  
>  /*
> @@ -277,141 +118,155 @@ void __weak arch_jump_label_text_poke_early(jump_label_t addr)
>  {
>  }
>  
> -static __init int init_jump_label(void)
> +static __init int jump_label_init(void)
>  {
> -	int ret;
>  	struct jump_entry *iter_start = __start___jump_table;
>  	struct jump_entry *iter_stop = __stop___jump_table;
> +	struct jump_label_key *key = NULL;
>  	struct jump_entry *iter;
>  
>  	jump_label_lock();
> -	ret = build_jump_label_hashtable(__start___jump_table,
> -					 __stop___jump_table);
> -	iter = iter_start;
> -	while (iter < iter_stop) {
> +	jump_label_sort_entries(iter_start, iter_stop);
> +
> +	for (iter = iter_start; iter < iter_stop; iter++) {
>  		arch_jump_label_text_poke_early(iter->code);
> -		iter++;
> +		if (iter->key == (jump_label_t)key)
> +			continue;
> +
> +		key = (struct jump_label_key *)iter->key;
> +		atomic_set(&key->enabled, 0);
> +		key->entries = iter;
> +#ifdef CONFIG_MODULES
> +		key->next = NULL;
> +#endif
>  	}
>  	jump_label_unlock();
> -	return ret;
> +
> +	return 0;
>  }
> -early_initcall(init_jump_label);
> +early_initcall(jump_label_init);
>  
>  #ifdef CONFIG_MODULES
>  
> -static struct jump_label_module_entry *
> -add_jump_label_module_entry(struct jump_label_entry *entry,
> -			    struct jump_entry *iter_begin,
> -			    int count, struct module *mod)
> -{
> -	struct jump_label_module_entry *e;
> -
> -	e = kmalloc(sizeof(struct jump_label_module_entry), GFP_KERNEL);
> -	if (!e)
> -		return ERR_PTR(-ENOMEM);
> -	e->mod = mod;
> -	e->nr_entries = count;
> -	e->table = iter_begin;
> -	hlist_add_head(&e->hlist, &entry->modules);
> -	return e;
> -}
> +struct jump_label_mod {
> +	struct jump_label_mod *next;
> +	struct jump_entry *entries;
> +	struct module *mod;
> +};
>  
> -static int add_jump_label_module(struct module *mod)
> +static int __jump_label_mod_text_reserved(void *start, void *end)
>  {
> -	struct jump_entry *iter, *iter_begin;
> -	struct jump_label_entry *entry;
> -	struct jump_label_module_entry *module_entry;
> -	int count;
> +	struct module *mod;
>  
> -	/* if the module doesn't have jump label entries, just return */
> -	if (!mod->num_jump_entries)
> +	mod = __module_text_address(start);
> +	if (!mod)
>  		return 0;
>  
> -	sort_jump_label_entries(mod->jump_entries,
> +	WARN_ON_ONCE(__module_text_address(end) != mod);
> +
> +	return __jump_label_text_reserved(mod->jump_entries,
>  				mod->jump_entries + mod->num_jump_entries);
> -	iter = mod->jump_entries;
> -	while (iter < mod->jump_entries + mod->num_jump_entries) {
> -		entry = get_jump_label_entry(iter->key);
> -		iter_begin = iter;
> -		count = 0;
> -		while ((iter < mod->jump_entries + mod->num_jump_entries) &&
> -			(iter->key == iter_begin->key)) {
> -				iter++;
> -				count++;
> -		}
> -		if (!entry) {
> -			entry = add_jump_label_entry(iter_begin->key, 0, NULL);
> -			if (IS_ERR(entry))
> -				return PTR_ERR(entry);
> -		}
> -		module_entry = add_jump_label_module_entry(entry, iter_begin,
> -							   count, mod);
> -		if (IS_ERR(module_entry))
> -			return PTR_ERR(module_entry);
> +}
> +
> +static void __jump_label_mod_update(struct jump_label_key *key, int enable)
> +{
> +	struct jump_label_mod *mod = key->next;
> +
> +	while (mod) {
> +		__jump_label_update(key, mod->entries, enable);
> +		mod = mod->next;
>  	}
> -	return 0;
>  }
>  
> -static void remove_jump_label_module(struct module *mod)
> +/***
> + * apply_jump_label_nops - patch module jump labels with arch_get_jump_label_nop()
> + * @mod: module to patch
> + *
> + * Allow for run-time selection of the optimal nops. Before the module
> + * loads patch these with arch_get_jump_label_nop(), which is specified by
> + * the arch specific jump label code.
> + */
> +static void jump_label_apply_nops(struct module *mod)
>  {
> -	struct hlist_head *head;
> -	struct hlist_node *node, *node_next, *module_node, *module_node_next;
> -	struct jump_label_entry *e;
> -	struct jump_label_module_entry *e_module;
> -	int i;
> +	struct jump_entry *iter_start = mod->jump_entries;
> +	struct jump_entry *iter_stop = mod->jump_entries + mod->num_jump_entries;
> +	struct jump_entry *iter;
>  
>  	/* if the module doesn't have jump label entries, just return */
> -	if (!mod->num_jump_entries)
> +	if (iter_start == iter_stop)
>  		return;
>  
> -	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
> -		head = &jump_label_table[i];
> -		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
> -			hlist_for_each_entry_safe(e_module, module_node,
> -						  module_node_next,
> -						  &(e->modules), hlist) {
> -				if (e_module->mod == mod) {
> -					hlist_del(&e_module->hlist);
> -					kfree(e_module);
> -				}
> -			}
> -			if (hlist_empty(&e->modules) && (e->nr_entries == 0)) {
> -				hlist_del(&e->hlist);
> -				kfree(e);
> -			}
> +	jump_label_sort_entries(iter_start, iter_stop);
> +
> +	for (iter = iter_start; iter < iter_stop; iter++)
> +		arch_jump_label_text_poke_early(iter->code);
> +}
> +
> +static int jump_label_add_module(struct module *mod)
> +{
> +	struct jump_entry *iter_start = mod->jump_entries;
> +	struct jump_entry *iter_stop = mod->jump_entries + mod->num_jump_entries;
> +	struct jump_entry *iter;
> +	struct jump_label_key *key = NULL;
> +	struct jump_label_mod *jlm;
> +
> +	for (iter = iter_start; iter < iter_stop; iter++) {
> +		if (iter->key == (jump_label_t)key)
> +			continue;
> +
> +		key = (struct jump_label_key)iter->key;
> +
> +		if (__module_address(iter->key) == mod) {
> +			atomic_set(&key->enabled, 0);
> +			key->entries = iter;
> +			key->next = NULL;
> +			continue;
>  		}
> +
> +		jlm = kzalloc(sizeof(struct jump_label_mod), GFP_KERNEL);
> +		if (!jlm)
> +			return -ENOMEM;
> +
> +		jlm->mod = mod;
> +		jlm->entries = iter;
> +		jlm->next = key->next;
> +		key->next = jlm;
> +
> +		if (jump_label_enabled(key))
> +			__jump_label_update(key, iter, JUMP_LABEL_ENABLE);
>  	}
> +
> +	return 0;
>  }
>  
> -static void remove_jump_label_module_init(struct module *mod)
> +static void jump_label_del_module(struct module *mod)
>  {
> -	struct hlist_head *head;
> -	struct hlist_node *node, *node_next, *module_node, *module_node_next;
> -	struct jump_label_entry *e;
> -	struct jump_label_module_entry *e_module;
> +	struct jump_entry *iter_start = mod->jump_entries;
> +	struct jump_entry *iter_stop = mod->jump_entries + mod->num_jump_entries;
>  	struct jump_entry *iter;
> -	int i, count;
> +	struct jump_label_key *key = NULL;
> +	struct jump_label_mod *jlm, **prev;
>  
> -	/* if the module doesn't have jump label entries, just return */
> -	if (!mod->num_jump_entries)
> -		return;
> +	for (iter = iter_start; iter < iter_stop; iter++) {
> +		if (iter->key == (jump_label_t)key)
> +			continue;
> +
> +		key = (struct jump_label_key)iter->key;
> +
> +		if (__module_address(iter->key) == mod)
> +			continue;
> +
> +		prev = &key->next;
> +		jlm = key->next;
> +
> +		while (jlm && jlm->mod != mod) {
> +			prev = &jlm->next;
> +			jlm = jlm->next;
> +		}
>  
> -	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
> -		head = &jump_label_table[i];
> -		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
> -			hlist_for_each_entry_safe(e_module, module_node,
> -						  module_node_next,
> -						  &(e->modules), hlist) {
> -				if (e_module->mod != mod)
> -					continue;
> -				count = e_module->nr_entries;
> -				iter = e_module->table;
> -				while (count--) {
> -					if (within_module_init(iter->code, mod))
> -						iter->key = 0;
> -					iter++;
> -				}
> -			}
> +		if (jlm) {
> +			*prev = jlm->next;
> +			kfree(jlm);
>  		}
>  	}
>  }
> @@ -424,61 +279,76 @@ jump_label_module_notify(struct notifier_block *self, unsigned long val,
>  	int ret = 0;
>  
>  	switch (val) {
> -	case MODULE_STATE_COMING:
> +	case MODULE_STATE_POST_RELOCATE:
>  		jump_label_lock();
> -		ret = add_jump_label_module(mod);
> -		if (ret)
> -			remove_jump_label_module(mod);
> +		jump_label_apply_nops(mod);
>  		jump_label_unlock();
>  		break;
> -	case MODULE_STATE_GOING:
> +	case MODULE_STATE_COMING:
>  		jump_label_lock();
> -		remove_jump_label_module(mod);
> +		ret = jump_label_add_module(mod);
> +		if (ret)
> +			jump_label_del_module(mod);
>  		jump_label_unlock();
>  		break;
> -	case MODULE_STATE_LIVE:
> +	case MODULE_STATE_GOING:
>  		jump_label_lock();
> -		remove_jump_label_module_init(mod);
> +		jump_label_del_module(mod);
>  		jump_label_unlock();
>  		break;
>  	}
>  	return ret;
>  }
>  
> +struct notifier_block jump_label_module_nb = {
> +	.notifier_call = jump_label_module_notify,
> +	.priority = 1, /* higher than tracepoints */
> +};
> +
> +static __init int jump_label_init_module(void)
> +{
> +	return register_module_notifier(&jump_label_module_nb);
> +}
> +early_initcall(jump_label_init_module);
> +
> +#endif /* CONFIG_MODULES */
> +
>  /***
> - * apply_jump_label_nops - patch module jump labels with arch_get_jump_label_nop()
> - * @mod: module to patch
> + * jump_label_text_reserved - check if addr range is reserved
> + * @start: start text addr
> + * @end: end text addr
>   *
> - * Allow for run-time selection of the optimal nops. Before the module
> - * loads patch these with arch_get_jump_label_nop(), which is specified by
> - * the arch specific jump label code.
> + * checks if the text addr located between @start and @end
> + * overlaps with any of the jump label patch addresses. Code
> + * that wants to modify kernel text should first verify that
> + * it does not overlap with any of the jump label addresses.
> + * Caller must hold jump_label_mutex.
> + *
> + * returns 1 if there is an overlap, 0 otherwise
>   */
> -void jump_label_apply_nops(struct module *mod)
> +int jump_label_text_reserved(void *start, void *end)
>  {
> -	struct jump_entry *iter;
> +	int ret = __jump_label_text_reserved(__start___jump_table,
> +			__stop___jump_table, start, end);
>  
> -	/* if the module doesn't have jump label entries, just return */
> -	if (!mod->num_jump_entries)
> -		return;
> +	if (ret)
> +		return ret;
>  
> -	iter = mod->jump_entries;
> -	while (iter < mod->jump_entries + mod->num_jump_entries) {
> -		arch_jump_label_text_poke_early(iter->code);
> -		iter++;
> -	}
> +#ifdef CONFIG_MODULES
> +	ret = __jump_label_mod_text_reserved(start, end);
> +#endif
> +	return ret;
>  }
>  
> -struct notifier_block jump_label_module_nb = {
> -	.notifier_call = jump_label_module_notify,
> -	.priority = 0,
> -};
> -
> -static __init int init_jump_label_module(void)
> +static void jump_label_update(struct jump_label_key *key, int enable)
>  {
> -	return register_module_notifier(&jump_label_module_nb);
> -}
> -early_initcall(init_jump_label_module);
> +	struct jump_entry *entry = key->entries;
>  
> -#endif /* CONFIG_MODULES */
> +	__jump_label_update(key, entry, enable);
> +
> +#ifdef CONFIG_MODULES
> +	__jump_label_mod_update(key, enable);
> +#endif
> +}
>  
>  #endif
> diff --git a/kernel/module.c b/kernel/module.c
> index efa290e..890cadf 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2789,6 +2789,13 @@ static struct module *load_module(void __user *umod,
>  		goto unlock;
>  	}
>  
> +	err = blocking_notifier_call_chain(&module_notify_list,
> +			MODULE_STATE_POST_RELOCATE, mod);
> +	if (err != NOTIFY_DONE) {
> +		err = notifier_to_errno(err);
> +		goto unlock;
> +	}
> +
>  	/* This has to be done once we're sure module name is unique. */
>  	if (!mod->taints)
>  		dynamic_debug_setup(info.debug, info.num_debug);
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index a353a4d..7bacdd3 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -117,7 +117,7 @@ enum event_type_t {
>  	EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
>  };
>  
> -atomic_t perf_task_events __read_mostly;
> +struct jump_label_key_counter perf_task_events __read_mostly;
>  static atomic_t nr_mmap_events __read_mostly;
>  static atomic_t nr_comm_events __read_mostly;
>  static atomic_t nr_task_events __read_mostly;
> @@ -2383,8 +2383,10 @@ static void free_event(struct perf_event *event)
>  	irq_work_sync(&event->pending);
>  
>  	if (!event->parent) {
> -		if (event->attach_state & PERF_ATTACH_TASK)
> -			jump_label_dec(&perf_task_events);
> +		if (event->attach_state & PERF_ATTACH_TASK) {
> +			if (atomic_dec_and_test(&perf_task_events.ref))
> +				jump_label_disable(&perf_task_events.key);
> +		}
>  		if (event->attr.mmap || event->attr.mmap_data)
>  			atomic_dec(&nr_mmap_events);
>  		if (event->attr.comm)
> @@ -4912,7 +4914,7 @@ fail:
>  	return err;
>  }
>  
> -atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
> +struct jump_label_key_counter perf_swevent_enabled[PERF_COUNT_SW_MAX];
>  
>  static void sw_perf_event_destroy(struct perf_event *event)
>  {
> @@ -4920,7 +4922,8 @@ static void sw_perf_event_destroy(struct perf_event *event)
>  
>  	WARN_ON(event->parent);
>  
> -	jump_label_dec(&perf_swevent_enabled[event_id]);
> +	if (atomic_dec_and_test(&perf_swevent_enabled[event_id].ref))
> +		jump_label_disable(&perf_swevent_enabled[event_id].key);
>  	swevent_hlist_put(event);
>  }
>  
> @@ -4945,12 +4948,15 @@ static int perf_swevent_init(struct perf_event *event)
>  
>  	if (!event->parent) {
>  		int err;
> +		atomic_t *ref;
>  
>  		err = swevent_hlist_get(event);
>  		if (err)
>  			return err;
>  
> -		jump_label_inc(&perf_swevent_enabled[event_id]);
> +		ref = &perf_swevent_enabled[event_id].ref;
> +		if (atomic_add_return(1, ref) == 1)
> +			jump_label_enable(&perf_swevent_enabled[event_id].key);
>  		event->destroy = sw_perf_event_destroy;
>  	}
>  
> @@ -5123,6 +5129,10 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
>  	u64 period;
>  
>  	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
> +
> +	if (event->state < PERF_EVENT_STATE_ACTIVE)
> +		return HRTIMER_NORESTART;
> +
>  	event->pmu->read(event);
>  
>  	perf_sample_data_init(&data, 0);
> @@ -5174,7 +5184,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
>  		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
>  		local64_set(&hwc->period_left, ktime_to_ns(remaining));
>  
> -		hrtimer_cancel(&hwc->hrtimer);
> +		hrtimer_try_to_cancel(&hwc->hrtimer);
>  	}
>  }
>  
> @@ -5713,8 +5723,10 @@ done:
>  	event->pmu = pmu;
>  
>  	if (!event->parent) {
> -		if (event->attach_state & PERF_ATTACH_TASK)
> -			jump_label_inc(&perf_task_events);
> +		if (event->attach_state & PERF_ATTACH_TASK) {
> +			if (atomic_add_return(1, &perf_task_events.ref) == 1)
> +				jump_label_enable(&perf_task_events.key);
> +		}
>  		if (event->attr.mmap || event->attr.mmap_data)
>  			atomic_inc(&nr_mmap_events);
>  		if (event->attr.comm)
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 343ff27..c848cd8 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -959,7 +959,7 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
>   *
>   * Synchronization rules: Callers must prevent restarting of the timer,
>   * otherwise this function is meaningless. It must not be called from
> - * hardirq contexts. The caller must not hold locks which would prevent
> + * interrupt contexts. The caller must not hold locks which would prevent
>   * completion of the timer's handler. The timer's handler must not call
>   * add_timer_on(). Upon exit the timer is not queued and the handler is
>   * not running on any CPU.
> @@ -971,12 +971,10 @@ int del_timer_sync(struct timer_list *timer)
>  #ifdef CONFIG_LOCKDEP
>  	unsigned long flags;
>  
> -	raw_local_irq_save(flags);
> -	local_bh_disable();
> +	local_irq_save(flags);
>  	lock_map_acquire(&timer->lockdep_map);
>  	lock_map_release(&timer->lockdep_map);
> -	_local_bh_enable();
> -	raw_local_irq_restore(flags);
> +	local_irq_restore(flags);
>  #endif
>  	/*
>  	 * don't use it in hardirq context, because it
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 68187af..13066e8 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -251,9 +251,9 @@ static void set_tracepoint(struct tracepoint_entry **entry,
>  {
>  	WARN_ON(strcmp((*entry)->name, elem->name) != 0);
>  
> -	if (elem->regfunc && !elem->state && active)
> +	if (elem->regfunc && !jump_label_enabled(&elem->key) && active)
>  		elem->regfunc();
> -	else if (elem->unregfunc && elem->state && !active)
> +	else if (elem->unregfunc && jump_label_enabled(&elem->key) && !active)
>  		elem->unregfunc();
>  
>  	/*
> @@ -264,13 +264,10 @@ static void set_tracepoint(struct tracepoint_entry **entry,
>  	 * is used.
>  	 */
>  	rcu_assign_pointer(elem->funcs, (*entry)->funcs);
> -	if (!elem->state && active) {
> -		jump_label_enable(&elem->state);
> -		elem->state = active;
> -	} else if (elem->state && !active) {
> -		jump_label_disable(&elem->state);
> -		elem->state = active;
> -	}
> +	if (active)
> +		jump_label_enable(&elem->key);
> +	else if (!active)
> +		jump_label_disable(&elem->key);
>  }
>  
>  /*
> @@ -281,13 +278,10 @@ static void set_tracepoint(struct tracepoint_entry **entry,
>   */
>  static void disable_tracepoint(struct tracepoint *elem)
>  {
> -	if (elem->unregfunc && elem->state)
> +	if (elem->unregfunc && jump_label_enabled(&elem->key))
>  		elem->unregfunc();
>  
> -	if (elem->state) {
> -		jump_label_disable(&elem->state);
> -		elem->state = 0;
> -	}
> +	jump_label_disable(&elem->key);
>  	rcu_assign_pointer(elem->funcs, NULL);
>  }
>  
> 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 16:04             ` Jason Baron
@ 2011-02-14 16:14               ` Mathieu Desnoyers
       [not found]               ` <BLU0-SMTP4069A1A89F06CDFF9B28F896D00@phx.gbl>
  1 sibling, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-14 16:14 UTC (permalink / raw)
  To: Jason Baron
  Cc: Peter Zijlstra, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

* Jason Baron (jbaron@redhat.com) wrote:
> On Mon, Feb 14, 2011 at 04:57:04PM +0100, Peter Zijlstra wrote:
> > On Mon, 2011-02-14 at 10:51 -0500, Jason Baron wrote:
> > > On Sat, Feb 12, 2011 at 07:47:45PM +0100, Peter Zijlstra wrote:
> > > > On Fri, 2011-02-11 at 22:38 +0100, Peter Zijlstra wrote:
> > > > > 
> > > > > So why can't we make that jump_label_entry::refcount and
> > > > > jump_label_key::state an atomic_t and be done with it? 
> > > > 
> > > > So I had a bit of a poke at this because I didn't quite understand why
> > > > all that stuff was as it was. I applied both Jason's patches and then
> > > > basically rewrote kernel/jump_label.c just for kicks ;-)
> > > > 
> > > > I haven't tried compiling this, let alone running it, but provided I
> > > > didn't actually forget anything the storage per key is now 16 bytes when
> > > > modules are disabled and 24 * (1 + mods) bytes for when they are
> > > > enabled. The old code had 64 + 40 * mods bytes.
> > > > 
> > > > I still need to clean up the static_branch_else bits and look at !x86
> > > > aside from the already mentioned bits.. but what do people think?
> > > > 
> > > > ---
> > > 
> > > Generally, I really like this! Its the direction I think the jump label
> > > code should be going. The complete removal of the hash table, makes the
> > > design a lot better and simpler. We just need to get some of the details
> > > cleaned up, and of course we need this to compile :) But I don't see any
> > > fundamental problems with this approach. 
> > > 
> > > Things that still need to be sorted out:
> > > 
> > > 1) Since jump_label.h, are included in kernel.h, (indirectly via the
> > > dynamic_debug.h) the atomic_t definitions could be problematic, since
> > > atomic.h includes kernel.h indirectly...so we might need some header
> > > magic.
> > 
> > Yes, I remember running into that when I did the jump_label_ref stuff,
> > some head-scratching is in order there.
> > 
> 
> yes. i suspect this might be the hardest bit of this...

I remember that atomic_t is defined in types.h now rather than atomic.h.
Any reason why you should keep including atomic.h from jump_label.h ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]               ` <BLU0-SMTP4069A1A89F06CDFF9B28F896D00@phx.gbl>
@ 2011-02-14 16:25                 ` Peter Zijlstra
  2011-02-14 16:29                   ` Jason Baron
  0 siblings, 1 reply; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-14 16:25 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jason Baron, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Mon, 2011-02-14 at 11:14 -0500, Mathieu Desnoyers wrote:
> 
> I remember that atomic_t is defined in types.h now rather than atomic.h.
> Any reason why you should keep including atomic.h from jump_label.h ?

Ooh, shiny.. we could probably move the few atomic_{read,inc,dec} users
in jump_label.h into out of line functions and have this sorted.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 16:25                 ` Peter Zijlstra
@ 2011-02-14 16:29                   ` Jason Baron
  2011-02-14 16:37                     ` Peter Zijlstra
  0 siblings, 1 reply; 113+ messages in thread
From: Jason Baron @ 2011-02-14 16:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Mon, Feb 14, 2011 at 05:25:54PM +0100, Peter Zijlstra wrote:
> > 
> > I remember that atomic_t is defined in types.h now rather than atomic.h.
> > Any reason why you should keep including atomic.h from jump_label.h ?
> 
> Ooh, shiny.. we could probably move the few atomic_{read,inc,dec} users
> in jump_label.h into out of line functions and have this sorted.
> 

inc and dec sure, but atomic_read() for the disabled case needs to be
inline....

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 16:29                   ` Jason Baron
@ 2011-02-14 16:37                     ` Peter Zijlstra
  2011-02-14 16:43                       ` Mathieu Desnoyers
                                         ` (2 more replies)
  0 siblings, 3 replies; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-14 16:37 UTC (permalink / raw)
  To: Jason Baron
  Cc: Mathieu Desnoyers, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Mon, 2011-02-14 at 11:29 -0500, Jason Baron wrote:
> On Mon, Feb 14, 2011 at 05:25:54PM +0100, Peter Zijlstra wrote:
> > > 
> > > I remember that atomic_t is defined in types.h now rather than atomic.h.
> > > Any reason why you should keep including atomic.h from jump_label.h ?
> > 
> > Ooh, shiny.. we could probably move the few atomic_{read,inc,dec} users
> > in jump_label.h into out of line functions and have this sorted.
> > 
> 
> inc and dec sure, but atomic_read() for the disabled case needs to be
> inline....

D'0h yes of course, I was thinking about jump_label_enabled(), but
there's still the static_branch() implementation to consider.

We could of course cheat implement our own version of atomic_read() in
order to avoid the whole header mess, but that's not pretty at all


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 16:37                     ` Peter Zijlstra
@ 2011-02-14 16:43                       ` Mathieu Desnoyers
  2011-02-14 16:46                       ` Steven Rostedt
       [not found]                       ` <BLU0-SMTP64371A838030ED92A7CCB696D00@phx.gbl>
  2 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-14 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Baron, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Mon, 2011-02-14 at 11:29 -0500, Jason Baron wrote:
> > On Mon, Feb 14, 2011 at 05:25:54PM +0100, Peter Zijlstra wrote:
> > > > 
> > > > I remember that atomic_t is defined in types.h now rather than atomic.h.
> > > > Any reason why you should keep including atomic.h from jump_label.h ?
> > > 
> > > Ooh, shiny.. we could probably move the few atomic_{read,inc,dec} users
> > > in jump_label.h into out of line functions and have this sorted.
> > > 
> > 
> > inc and dec sure, but atomic_read() for the disabled case needs to be
> > inline....
> 
> D'0h yes of course, I was thinking about jump_label_enabled(), but
> there's still the static_branch() implementation to consider.
> 
> We could of course cheat implement our own version of atomic_read() in
> order to avoid the whole header mess, but that's not pretty at all
> 

OK, so the other way around then : why does kernel.h need to include
dynamic_debug.h (which includes jump_label.h) ?

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 16:37                     ` Peter Zijlstra
  2011-02-14 16:43                       ` Mathieu Desnoyers
@ 2011-02-14 16:46                       ` Steven Rostedt
  2011-02-14 16:53                         ` Peter Zijlstra
  2011-02-14 17:18                         ` Steven Rostedt
       [not found]                       ` <BLU0-SMTP64371A838030ED92A7CCB696D00@phx.gbl>
  2 siblings, 2 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-02-14 16:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx, andi, roland,
	rth, masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney,
	michael, linux-kernel

On Mon, 2011-02-14 at 17:37 +0100, Peter Zijlstra wrote:

> We could of course cheat implement our own version of atomic_read() in
> order to avoid the whole header mess, but that's not pretty at all

Oh God please no! ;)

atomic_read() is implemented per arch.

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 16:46                       ` Steven Rostedt
@ 2011-02-14 16:53                         ` Peter Zijlstra
  2011-02-14 17:18                         ` Steven Rostedt
  1 sibling, 0 replies; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-14 16:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx, andi, roland,
	rth, masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney,
	michael, linux-kernel

On Mon, 2011-02-14 at 11:46 -0500, Steven Rostedt wrote:
> On Mon, 2011-02-14 at 17:37 +0100, Peter Zijlstra wrote:
> 
> > We could of course cheat implement our own version of atomic_read() in
> > order to avoid the whole header mess, but that's not pretty at all
> 
> Oh God please no! ;)
> 
> atomic_read() is implemented per arch.

Ah, but it needn't be:

static inline int atomic_read(atomic_t *a)
{
	return ACCESS_ONCE(a->counter);
}

is basically it.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 16:46                       ` Steven Rostedt
  2011-02-14 16:53                         ` Peter Zijlstra
@ 2011-02-14 17:18                         ` Steven Rostedt
  2011-02-14 17:23                           ` Mike Frysinger
  2011-02-14 17:27                           ` Peter Zijlstra
  1 sibling, 2 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-02-14 17:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx, andi, roland,
	rth, masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney,
	michael, linux-kernel

On Mon, 2011-02-14 at 11:46 -0500, Steven Rostedt wrote:
> On Mon, 2011-02-14 at 17:37 +0100, Peter Zijlstra wrote:
> 
> > We could of course cheat implement our own version of atomic_read() in
> > order to avoid the whole header mess, but that's not pretty at all
> 
> Oh God please no! ;)
> 
> atomic_read() is implemented per arch.

Hmm, maybe this isn't so bad:

alpha:
#define atomic_read(v)          (*(volatile int *)&(v)->counter)

arm:
#define atomic_read(v)  (*(volatile int *)&(v)->counter)

avr32:
#define atomic_read(v)          (*(volatile int *)&(v)->counter)

blackfin:
#define atomic_read(v)  __raw_uncached_fetch_asm(&(v)->counter)

cris:
#define atomic_read(v) (*(volatile int *)&(v)->counter)

frv:
#define atomic_read(v)          (*(volatile int *)&(v)->counter)

h8300:
#define atomic_read(v)          (*(volatile int *)&(v)->counter)

ia64:
#define atomic_read(v)          (*(volatile int *)&(v)->counter)

m32r:
#define atomic_read(v)  (*(volatile int *)&(v)->counter)

m68k:
#define atomic_read(v)          (*(volatile int *)&(v)->counter)

microblaze: uses generic which is:


mips:
#define atomic_read(v)          (*(volatile int *)&(v)->counter)

mn10300:
#define atomic_read(v)  ((v)->counter)

parisc:
static __inline__ int atomic_read(const atomic_t *v)
{
        return (*(volatile int *)&(v)->counter);
}

powerpc:
static __inline__ int atomic_read(const atomic_t *v)
{
        int t;

        __asm__ __volatile__("lwz%U1%X1 %0,%1" : "=r"(t) : "m"(v->counter));

        return t;
}

which is still pretty much a volatile read


s390:
static inline int atomic_read(const atomic_t *v)
{
        barrier();
        return v->counter;
}

score:
uses generic

sh:
#define atomic_read(v)          (*(volatile int *)&(v)->counter)

sparc 32:
sparc 64:
#define atomic_read(v)          (*(volatile int *)&(v)->counter)


tile:
static inline int atomic_read(const atomic_t *v)
{
       return v->counter;
}

Hmm, nothing volatile at all?

x86:
static inline int atomic_read(const atomic_t *v)
{
        return (*(volatile int *)&(v)->counter);
}

xtensa:
#define atomic_read(v)          (*(volatile int *)&(v)->counter)

So all but a few have basically (as you said on IRC)
#define atomic_read(v) ACCESS_ONCE(v)

Those few are blackfin, s390, powerpc and tile.

s390 probably doesn't need that much of a big hammer with atomic_read()
(unless it uses it in its own arch that expects it to be such).

powerpc could probably be converted to just the volatile code as
everything else. Not sure why it did it that way. To be different?

tile just looks wrong, but wont be hurt with adding volatile to that.

blackfin, seems to be doing quite a lot. Not sure if it is required, but
that may need a bit of investigating to understand why it does the
raw_uncached thing.


Maybe we could move the atomic_read() out of atomic and make it a
standard inline for all (in kernel.h)?

-- Steve


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:18                         ` Steven Rostedt
@ 2011-02-14 17:23                           ` Mike Frysinger
  2011-02-14 17:27                           ` Peter Zijlstra
  1 sibling, 0 replies; 113+ messages in thread
From: Mike Frysinger @ 2011-02-14 17:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel

On Mon, Feb 14, 2011 at 12:18, Steven Rostedt wrote:
> blackfin:
> #define atomic_read(v)  __raw_uncached_fetch_asm(&(v)->counter)
>
> blackfin, seems to be doing quite a lot. Not sure if it is required, but
> that may need a bit of investigating to understand why it does the
> raw_uncached thing.

this is only for SMP ports, and it's due to our lack of
cache-coherency.  for non-SMP, we use asm-generic.
-mike

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:18                         ` Steven Rostedt
  2011-02-14 17:23                           ` Mike Frysinger
@ 2011-02-14 17:27                           ` Peter Zijlstra
  2011-02-14 17:29                             ` Mike Frysinger
                                               ` (2 more replies)
  1 sibling, 3 replies; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-14 17:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx, andi, roland,
	rth, masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney,
	michael, linux-kernel, Mike Frysinger, Chris Metcalf, dhowells,
	Martin Schwidefsky, heiko.carstens, benh

On Mon, 2011-02-14 at 12:18 -0500, Steven Rostedt wrote:

> mn10300:
> #define atomic_read(v)  ((v)->counter)

> tile:
> static inline int atomic_read(const atomic_t *v)
> {
>        return v->counter;
> }

Yeah, I already send email to the respective maintainers telling them
they might want to fix this ;-)


> So all but a few have basically (as you said on IRC)
> #define atomic_read(v) ACCESS_ONCE(v)

ACCESS_ONCE(v->counter), but yeah :-)

> Those few are blackfin, s390, powerpc and tile.
> 
> s390 probably doesn't need that much of a big hammer with atomic_read()
> (unless it uses it in its own arch that expects it to be such).

Right, it could just do the volatile thing..

> powerpc could probably be converted to just the volatile code as
> everything else. Not sure why it did it that way. To be different?

Maybe that code was written before we all got inventive with the
volatile cast stuff..

> blackfin, seems to be doing quite a lot. Not sure if it is required, but
> that may need a bit of investigating to understand why it does the
> raw_uncached thing.

>From what I can tell its flushing its write cache, invalidating its
d-cache and then issue the read, something which is _way_ overboard.

> Maybe we could move the atomic_read() out of atomic and make it a
> standard inline for all (in kernel.h)?

Certainly looks like that might work..


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:27                           ` Peter Zijlstra
@ 2011-02-14 17:29                             ` Mike Frysinger
  2011-02-14 17:38                               ` Peter Zijlstra
  2011-02-14 17:38                             ` Will Newton
  2011-02-15 15:20                             ` Heiko Carstens
  2 siblings, 1 reply; 113+ messages in thread
From: Mike Frysinger @ 2011-02-14 17:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Chris Metcalf, dhowells,
	Martin Schwidefsky, heiko.carstens, benh

On Mon, Feb 14, 2011 at 12:27, Peter Zijlstra wrote:
> On Mon, 2011-02-14 at 12:18 -0500, Steven Rostedt wrote:
>> blackfin, seems to be doing quite a lot. Not sure if it is required, but
>> that may need a bit of investigating to understand why it does the
>> raw_uncached thing.
>
> From what I can tell its flushing its write cache, invalidating its
> d-cache and then issue the read, something which is _way_ overboard.

not when the cores in a SMP system lack cache coherency

please to review:
http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:smp-like
-mike

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:29                             ` Mike Frysinger
@ 2011-02-14 17:38                               ` Peter Zijlstra
  2011-02-14 17:45                                 ` Mike Frysinger
  0 siblings, 1 reply; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-14 17:38 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Steven Rostedt, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Chris Metcalf, dhowells,
	Martin Schwidefsky, heiko.carstens, benh, Paul E. McKenney

On Mon, 2011-02-14 at 12:29 -0500, Mike Frysinger wrote:
> On Mon, Feb 14, 2011 at 12:27, Peter Zijlstra wrote:
> > On Mon, 2011-02-14 at 12:18 -0500, Steven Rostedt wrote:
> >> blackfin, seems to be doing quite a lot. Not sure if it is required, but
> >> that may need a bit of investigating to understand why it does the
> >> raw_uncached thing.
> >
> > From what I can tell its flushing its write cache, invalidating its
> > d-cache and then issue the read, something which is _way_ overboard.
> 
> not when the cores in a SMP system lack cache coherency

But atomic_read() is completely unordered, so even on a non-coherent
system a regular read should suffice, any old value is correct.

The only problem would be when you could get cache aliasing and read
something totally unrelated.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:27                           ` Peter Zijlstra
  2011-02-14 17:29                             ` Mike Frysinger
@ 2011-02-14 17:38                             ` Will Newton
  2011-02-14 17:43                               ` Peter Zijlstra
  2011-02-15 15:20                             ` Heiko Carstens
  2 siblings, 1 reply; 113+ messages in thread
From: Will Newton @ 2011-02-14 17:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Mike Frysinger,
	Chris Metcalf, dhowells, Martin Schwidefsky, heiko.carstens,
	benh

On Mon, Feb 14, 2011 at 5:27 PM, Peter Zijlstra <peterz@infradead.org> wrote:

>> So all but a few have basically (as you said on IRC)
>> #define atomic_read(v) ACCESS_ONCE(v)
>
> ACCESS_ONCE(v->counter), but yeah :-)

I maintain an out-of-tree architecture where that isn't the case
unfortunately [1]. Not expecting any special favours for being
out-of-tree of course, but just thought I would add that data point.

[1] Our atomic operations go around the cache rather than through it,
so the value of an atomic cannot be read with a normal load
instruction.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:38                             ` Will Newton
@ 2011-02-14 17:43                               ` Peter Zijlstra
  2011-02-14 17:50                                 ` Will Newton
  0 siblings, 1 reply; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-14 17:43 UTC (permalink / raw)
  To: Will Newton
  Cc: Steven Rostedt, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Mike Frysinger,
	Chris Metcalf, dhowells, Martin Schwidefsky, heiko.carstens,
	benh

On Mon, 2011-02-14 at 17:38 +0000, Will Newton wrote:
> On Mon, Feb 14, 2011 at 5:27 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> >> So all but a few have basically (as you said on IRC)
> >> #define atomic_read(v) ACCESS_ONCE(v)
> >
> > ACCESS_ONCE(v->counter), but yeah :-)
> 
> I maintain an out-of-tree architecture where that isn't the case
> unfortunately [1]. Not expecting any special favours for being
> out-of-tree of course, but just thought I would add that data point.
> 
> [1] Our atomic operations go around the cache rather than through it,
> so the value of an atomic cannot be read with a normal load
> instruction.

Cannot how? It would observe a stale value? That is acceptable for
atomic_read(). 


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:38                               ` Peter Zijlstra
@ 2011-02-14 17:45                                 ` Mike Frysinger
  0 siblings, 0 replies; 113+ messages in thread
From: Mike Frysinger @ 2011-02-14 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Chris Metcalf, dhowells,
	Martin Schwidefsky, heiko.carstens, benh, Paul E. McKenney

On Mon, Feb 14, 2011 at 12:38, Peter Zijlstra wrote:
> On Mon, 2011-02-14 at 12:29 -0500, Mike Frysinger wrote:
>> On Mon, Feb 14, 2011 at 12:27, Peter Zijlstra wrote:
>> > On Mon, 2011-02-14 at 12:18 -0500, Steven Rostedt wrote:
>> >> blackfin, seems to be doing quite a lot. Not sure if it is required, but
>> >> that may need a bit of investigating to understand why it does the
>> >> raw_uncached thing.
>> >
>> > From what I can tell its flushing its write cache, invalidating its
>> > d-cache and then issue the read, something which is _way_ overboard.
>>
>> not when the cores in a SMP system lack cache coherency
>
> But atomic_read() is completely unordered, so even on a non-coherent
> system a regular read should suffice, any old value is correct.

the words you use seem to form a line of reasoning that makes sense to
me.  we'll have to play first though to double check.

> The only problem would be when you could get cache aliasing and read
> something totally unrelated.

being a nommu arch, there shouldnt be any cache aliasing issues.
we're just trying to make sure that what another core has pushed out
isnt stale in another core's cache when the other core does the read.
-mike

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:43                               ` Peter Zijlstra
@ 2011-02-14 17:50                                 ` Will Newton
  2011-02-14 18:04                                   ` Peter Zijlstra
  2011-02-14 18:24                                   ` Peter Zijlstra
  0 siblings, 2 replies; 113+ messages in thread
From: Will Newton @ 2011-02-14 17:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Mike Frysinger,
	Chris Metcalf, dhowells, Martin Schwidefsky, heiko.carstens,
	benh

On Mon, Feb 14, 2011 at 5:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2011-02-14 at 17:38 +0000, Will Newton wrote:
>> On Mon, Feb 14, 2011 at 5:27 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> >> So all but a few have basically (as you said on IRC)
>> >> #define atomic_read(v) ACCESS_ONCE(v)
>> >
>> > ACCESS_ONCE(v->counter), but yeah :-)
>>
>> I maintain an out-of-tree architecture where that isn't the case
>> unfortunately [1]. Not expecting any special favours for being
>> out-of-tree of course, but just thought I would add that data point.
>>
>> [1] Our atomic operations go around the cache rather than through it,
>> so the value of an atomic cannot be read with a normal load
>> instruction.
>
> Cannot how? It would observe a stale value? That is acceptable for
> atomic_read().

It would observe a stale value, but that value would only be updated
when the cache line was reloaded from main memory which would have to
be triggered by either eviction or cache flushing. So it could get
pretty stale. Whilst that's probably within the spec. of atomic_read I
suspect it would lead to problems in practice. I could be wrong
though.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:50                                 ` Will Newton
@ 2011-02-14 18:04                                   ` Peter Zijlstra
  2011-02-14 18:24                                   ` Peter Zijlstra
  1 sibling, 0 replies; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-14 18:04 UTC (permalink / raw)
  To: Will Newton
  Cc: Steven Rostedt, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Mike Frysinger,
	Chris Metcalf, dhowells, Martin Schwidefsky, heiko.carstens,
	benh

On Mon, 2011-02-14 at 17:50 +0000, Will Newton wrote:
> On Mon, Feb 14, 2011 at 5:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, 2011-02-14 at 17:38 +0000, Will Newton wrote:
> >> On Mon, Feb 14, 2011 at 5:27 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >> >> So all but a few have basically (as you said on IRC)
> >> >> #define atomic_read(v) ACCESS_ONCE(v)
> >> >
> >> > ACCESS_ONCE(v->counter), but yeah :-)
> >>
> >> I maintain an out-of-tree architecture where that isn't the case
> >> unfortunately [1]. Not expecting any special favours for being
> >> out-of-tree of course, but just thought I would add that data point.
> >>
> >> [1] Our atomic operations go around the cache rather than through it,
> >> so the value of an atomic cannot be read with a normal load
> >> instruction.
> >
> > Cannot how? It would observe a stale value? That is acceptable for
> > atomic_read().
> 
> It would observe a stale value, but that value would only be updated
> when the cache line was reloaded from main memory which would have to
> be triggered by either eviction or cache flushing. So it could get
> pretty stale. Whilst that's probably within the spec. of atomic_read I
> suspect it would lead to problems in practice. I could be wrong
> though.

Arguable, finding such cases would be a Good (TM) thing.. but yeah, I
can imagine you're not too keen on being the one finding them.

Luckily it looks like you're in the same boat as blackfin-smp is.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:50                                 ` Will Newton
  2011-02-14 18:04                                   ` Peter Zijlstra
@ 2011-02-14 18:24                                   ` Peter Zijlstra
  2011-02-14 18:53                                     ` Mathieu Desnoyers
  2011-02-14 21:29                                     ` Steven Rostedt
  1 sibling, 2 replies; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-14 18:24 UTC (permalink / raw)
  To: Will Newton
  Cc: Steven Rostedt, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Mike Frysinger,
	Chris Metcalf, dhowells, Martin Schwidefsky, heiko.carstens,
	benh

On Mon, 2011-02-14 at 17:50 +0000, Will Newton wrote:
> 
> It would observe a stale value, but that value would only be updated
> when the cache line was reloaded from main memory which would have to
> be triggered by either eviction or cache flushing. So it could get
> pretty stale. Whilst that's probably within the spec. of atomic_read I
> suspect it would lead to problems in practice. I could be wrong
> though. 

Right, so the typical scenario that could cause pain is something like:

while (atomic_read(&foo) != n)
  cpu_relax();

and the problem is that cpu_relax() doesn't know which particular
cacheline to flush in order to make things go faster, hm?




^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 18:24                                   ` Peter Zijlstra
@ 2011-02-14 18:53                                     ` Mathieu Desnoyers
  2011-02-14 21:29                                     ` Steven Rostedt
  1 sibling, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-14 18:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Newton, Steven Rostedt, Jason Baron, hpa, mingo, tglx, andi,
	roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem, sam,
	ddaney, michael, linux-kernel, Mike Frysinger, Chris Metcalf,
	dhowells, Martin Schwidefsky, heiko.carstens, benh

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Mon, 2011-02-14 at 17:50 +0000, Will Newton wrote:
> > 
> > It would observe a stale value, but that value would only be updated
> > when the cache line was reloaded from main memory which would have to
> > be triggered by either eviction or cache flushing. So it could get
> > pretty stale. Whilst that's probably within the spec. of atomic_read I
> > suspect it would lead to problems in practice. I could be wrong
> > though. 
> 
> Right, so the typical scenario that could cause pain is something like:
> 
> while (atomic_read(&foo) != n)
>   cpu_relax();
> 
> and the problem is that cpu_relax() doesn't know which particular
> cacheline to flush in order to make things go faster, hm?

As an information point, this is why I mapped "uatomic_read()" to
"CMM_LOAD_SHARED" in my userspace RCU library rather than just doing a
volatile access. On cache-coherent architectures, the arch-specific code
turns CMM_LOAD_SHARED into a simple volatile access, but for
non-cache-coherent architectures, it can call the required
architecture-level primitives to fetch the stale data.

FWIW, I also have "CMM_STORE_SHARED" which does pretty much the same
thing. I use these for rcu_assign_pointer() and rcu_dereference() (thus
replacing "ACCESS_ONCE()").

The more detailed comment and macros are found at
http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/system.h

I hope this helps,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]                       ` <BLU0-SMTP64371A838030ED92A7CCB696D00@phx.gbl>
@ 2011-02-14 18:54                         ` Jason Baron
  2011-02-14 19:20                           ` Peter Zijlstra
  0 siblings, 1 reply; 113+ messages in thread
From: Jason Baron @ 2011-02-14 18:54 UTC (permalink / raw)
  To: Mathieu Desnoyers, peterz
  Cc: hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Mon, Feb 14, 2011 at 11:43:43AM -0500, Mathieu Desnoyers wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Mon, 2011-02-14 at 11:29 -0500, Jason Baron wrote:
> > > On Mon, Feb 14, 2011 at 05:25:54PM +0100, Peter Zijlstra wrote:
> > > > > 
> > > > > I remember that atomic_t is defined in types.h now rather than atomic.h.
> > > > > Any reason why you should keep including atomic.h from jump_label.h ?
> > > > 
> > > > Ooh, shiny.. we could probably move the few atomic_{read,inc,dec} users
> > > > in jump_label.h into out of line functions and have this sorted.
> > > > 
> > > 
> > > inc and dec sure, but atomic_read() for the disabled case needs to be
> > > inline....
> > 
> > D'0h yes of course, I was thinking about jump_label_enabled(), but
> > there's still the static_branch() implementation to consider.
> > 
> > We could of course cheat implement our own version of atomic_read() in
> > order to avoid the whole header mess, but that's not pretty at all
> > 
> 
> OK, so the other way around then : why does kernel.h need to include
> dynamic_debug.h (which includes jump_label.h) ?
> 

well, its used to dynamically enable/disable pr_debug() statements which
actually have now moved to linux/printk.h, which is included by
kernel.h.

I don't need an atomic_read() in the disabled case for dynamic debug,
and I would be ok, #ifdef CONFIG_JUMP_LABEL, in dynamic_debug.h. Its not
the prettiest solution. But I can certainly live with it for now, so
that we can sort out the atomic_read() issue independently.

Peter, Mathieu, are you guys ok with this?

-Jason

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 18:54                         ` Jason Baron
@ 2011-02-14 19:20                           ` Peter Zijlstra
  2011-02-14 19:48                             ` Mathieu Desnoyers
  0 siblings, 1 reply; 113+ messages in thread
From: Peter Zijlstra @ 2011-02-14 19:20 UTC (permalink / raw)
  To: Jason Baron
  Cc: Mathieu Desnoyers, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

On Mon, 2011-02-14 at 13:54 -0500, Jason Baron wrote:

> I don't need an atomic_read() in the disabled case for dynamic debug,
> and I would be ok, #ifdef CONFIG_JUMP_LABEL, in dynamic_debug.h. Its not
> the prettiest solution. But I can certainly live with it for now, so
> that we can sort out the atomic_read() issue independently.
> 
> Peter, Mathieu, are you guys ok with this?

Yeah, lets see where that gets us.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 19:20                           ` Peter Zijlstra
@ 2011-02-14 19:48                             ` Mathieu Desnoyers
  0 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-14 19:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Baron, hpa, rostedt, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, davem, sam, ddaney, michael,
	linux-kernel

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Mon, 2011-02-14 at 13:54 -0500, Jason Baron wrote:
> 
> > I don't need an atomic_read() in the disabled case for dynamic debug,
> > and I would be ok, #ifdef CONFIG_JUMP_LABEL, in dynamic_debug.h. Its not
> > the prettiest solution. But I can certainly live with it for now, so
> > that we can sort out the atomic_read() issue independently.
> > 
> > Peter, Mathieu, are you guys ok with this?
> 
> Yeah, lets see where that gets us.
> 

Works for me as long as you put a nice witty comment around this
disgusting hack (something related to "include Hell", where bad
programmers go after they die). ;-)

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 18:24                                   ` Peter Zijlstra
  2011-02-14 18:53                                     ` Mathieu Desnoyers
@ 2011-02-14 21:29                                     ` Steven Rostedt
  2011-02-14 21:39                                       ` Steven Rostedt
  1 sibling, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2011-02-14 21:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Newton, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Mike Frysinger,
	Chris Metcalf, dhowells, Martin Schwidefsky, heiko.carstens,
	benh

On Mon, 2011-02-14 at 19:24 +0100, Peter Zijlstra wrote:
> On Mon, 2011-02-14 at 17:50 +0000, Will Newton wrote:
> > 
> > It would observe a stale value, but that value would only be updated
> > when the cache line was reloaded from main memory which would have to
> > be triggered by either eviction or cache flushing. So it could get
> > pretty stale. Whilst that's probably within the spec. of atomic_read I
> > suspect it would lead to problems in practice. I could be wrong
> > though. 
> 
> Right, so the typical scenario that could cause pain is something like:
> 
> while (atomic_read(&foo) != n)
>   cpu_relax();
> 
> and the problem is that cpu_relax() doesn't know which particular
> cacheline to flush in order to make things go faster, hm?

But what about any global variable? Can't we also just have:

	while (global != n)
		cpu_relax();

?

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 21:29                                     ` Steven Rostedt
@ 2011-02-14 21:39                                       ` Steven Rostedt
  2011-02-14 21:46                                         ` David Miller
  2011-02-14 22:15                                         ` Matt Fleming
  0 siblings, 2 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-02-14 21:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Newton, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Mike Frysinger,
	Chris Metcalf, dhowells, Martin Schwidefsky, heiko.carstens,
	benh

On Mon, 2011-02-14 at 16:29 -0500, Steven Rostedt wrote:

> > while (atomic_read(&foo) != n)
> >   cpu_relax();
> > 
> > and the problem is that cpu_relax() doesn't know which particular
> > cacheline to flush in order to make things go faster, hm?
> 
> But what about any global variable? Can't we also just have:
> 
> 	while (global != n)
> 		cpu_relax();
> 
> ?

Matt Fleming answered this for me on IRC, and I'll share the answer here
(for those that are dying to know ;)

Seems that the atomic_inc() uses ll/sc operations that do not affect the
cache. Thus the problem is only with atomic_read() as

	while(atomic_read(&foo) != n)
		cpu_relax();

Will just check the cache version of foo. But because ll/sc skips the
cache, the foo will never update. That is, atomic_inc() and friends do
not touch the cache, and the CPU spinning in this loop will is only
checking the cache, and will spin forever.

Thus it is not about global, as global is updated by normal means and
will update the caches. atomic_t is updated via the ll/sc that ignores
the cache and causes all this to break down. IOW... broken hardware ;)

Matt, feel free to correct this if it is wrong.

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 21:39                                       ` Steven Rostedt
@ 2011-02-14 21:46                                         ` David Miller
  2011-02-14 22:20                                           ` Steven Rostedt
  2011-02-14 22:37                                           ` Matt Fleming
  2011-02-14 22:15                                         ` Matt Fleming
  1 sibling, 2 replies; 113+ messages in thread
From: David Miller @ 2011-02-14 21:46 UTC (permalink / raw)
  To: rostedt
  Cc: peterz, will.newton, jbaron, mathieu.desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	ddaney, michael, linux-kernel, vapier, cmetcalf, dhowells,
	schwidefsky, heiko.carstens, benh

From: Steven Rostedt <rostedt@goodmis.org>
Date: Mon, 14 Feb 2011 16:39:36 -0500

> Thus it is not about global, as global is updated by normal means and
> will update the caches. atomic_t is updated via the ll/sc that ignores
> the cache and causes all this to break down. IOW... broken hardware ;)

I don't see how cache coherency can possibly work if the hardware
behaves this way.

In cache aliasing situations, yes I can understand a L1 cache visibility
issue being present, but with kernel only stuff that should never happen
otherwise we have a bug in the arch cache flushing support.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 21:39                                       ` Steven Rostedt
  2011-02-14 21:46                                         ` David Miller
@ 2011-02-14 22:15                                         ` Matt Fleming
  1 sibling, 0 replies; 113+ messages in thread
From: Matt Fleming @ 2011-02-14 22:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Will Newton, Jason Baron, Mathieu Desnoyers, hpa,
	mingo, tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec,
	avi, davem, sam, ddaney, michael, linux-kernel, Mike Frysinger,
	Chris Metcalf, dhowells, Martin Schwidefsky, heiko.carstens,
	benh

On Mon, 14 Feb 2011 16:39:36 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 2011-02-14 at 16:29 -0500, Steven Rostedt wrote:
> 
> > > while (atomic_read(&foo) != n)
> > >   cpu_relax();
> > > 
> > > and the problem is that cpu_relax() doesn't know which particular
> > > cacheline to flush in order to make things go faster, hm?
> > 
> > But what about any global variable? Can't we also just have:
> > 
> > 	while (global != n)
> > 		cpu_relax();
> > 
> > ?
> 
> Matt Fleming answered this for me on IRC, and I'll share the answer
> here (for those that are dying to know ;)
> 
> Seems that the atomic_inc() uses ll/sc operations that do not affect
> the cache. Thus the problem is only with atomic_read() as
> 
> 	while(atomic_read(&foo) != n)
> 		cpu_relax();
> 
> Will just check the cache version of foo. But because ll/sc skips the
> cache, the foo will never update. That is, atomic_inc() and friends do
> not touch the cache, and the CPU spinning in this loop will is only
> checking the cache, and will spin forever.

Right. When I wrote the atomic_read() implementation that Will is
talking about I used the ll-equivalent instruction to bypass the cache,
e.g. I wrote it assembly because the compiler didn't emit that
instruction.

And that is what it boils down to really, the ll/sc instructions are
different from any other instructions in the ISA as they bypass the
cache and are not emitted by the compiler. So, in order to maintain
coherence with other cpus doing atomic updates on memory addresses, or
rather to avoid reading stale values, it's necessary to use the ll
instruction - and this isn't possible from C.

> Thus it is not about global, as global is updated by normal means and
> will update the caches. atomic_t is updated via the ll/sc that ignores
> the cache and causes all this to break down. IOW... broken hardware ;)

Well, to be precise it's about read-modify-write operations - the
architecture does maintain cache coherence in that writes from one CPU
are immediately visible to other CPUs.

FYI spinlocks are also implemented with ll/sc instructions.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 21:46                                         ` David Miller
@ 2011-02-14 22:20                                           ` Steven Rostedt
  2011-02-14 22:21                                             ` Steven Rostedt
                                                               ` (2 more replies)
  2011-02-14 22:37                                           ` Matt Fleming
  1 sibling, 3 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-02-14 22:20 UTC (permalink / raw)
  To: David Miller
  Cc: peterz, will.newton, jbaron, mathieu.desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	ddaney, michael, linux-kernel, vapier, cmetcalf, dhowells,
	schwidefsky, heiko.carstens, benh

On Mon, 2011-02-14 at 13:46 -0800, David Miller wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> Date: Mon, 14 Feb 2011 16:39:36 -0500
> 
> > Thus it is not about global, as global is updated by normal means and
> > will update the caches. atomic_t is updated via the ll/sc that ignores
> > the cache and causes all this to break down. IOW... broken hardware ;)
> 
> I don't see how cache coherency can possibly work if the hardware
> behaves this way.
> 
> In cache aliasing situations, yes I can understand a L1 cache visibility
> issue being present, but with kernel only stuff that should never happen
> otherwise we have a bug in the arch cache flushing support.

I guess the issue is, if you use ll/sc on memory, you must always use
ll/sc on that memory, otherwise any normal read won't read the proper
cache.

The atomic_read() in this arch uses ll to read the memory directly and
skip the cache. If we make atomic_read() like the other archs:

#define atomic_read(v)	(*(volatile int *)&(v)->counter)

This pulls the counter into cache, and it will not be updated by a
atomic_inc() from another CPU.

Ideally, we would like a single atomic_read() but due to these wacky
archs, it may not be possible.

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 22:20                                           ` Steven Rostedt
@ 2011-02-14 22:21                                             ` Steven Rostedt
  2011-02-14 22:21                                             ` H. Peter Anvin
  2011-02-14 22:33                                             ` David Miller
  2 siblings, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-02-14 22:21 UTC (permalink / raw)
  To: David Miller
  Cc: peterz, will.newton, jbaron, mathieu.desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	ddaney, michael, linux-kernel, vapier, cmetcalf, dhowells,
	schwidefsky, heiko.carstens, benh

On Mon, 2011-02-14 at 17:20 -0500, Steven Rostedt wrote:

> I guess the issue is, if you use ll/sc on memory, you must always use
> ll/sc on that memory, otherwise any normal read won't read the proper
> cache.

s/cache/memory/

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 22:20                                           ` Steven Rostedt
  2011-02-14 22:21                                             ` Steven Rostedt
@ 2011-02-14 22:21                                             ` H. Peter Anvin
  2011-02-14 22:29                                               ` Mathieu Desnoyers
       [not found]                                               ` <BLU0-SMTP98BFCC52FD41661DD9CC1E96D00@phx.gbl>
  2011-02-14 22:33                                             ` David Miller
  2 siblings, 2 replies; 113+ messages in thread
From: H. Peter Anvin @ 2011-02-14 22:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Miller, peterz, will.newton, jbaron, mathieu.desnoyers,
	mingo, tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec,
	avi, sam, ddaney, michael, linux-kernel, vapier, cmetcalf,
	dhowells, schwidefsky, heiko.carstens, benh

On 02/14/2011 02:20 PM, Steven Rostedt wrote:
> 
> Ideally, we would like a single atomic_read() but due to these wacky
> archs, it may not be possible.
> 

#ifdef ARCH_ATOMIC_READ_SUCKS_EGGS?

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 22:21                                             ` H. Peter Anvin
@ 2011-02-14 22:29                                               ` Mathieu Desnoyers
       [not found]                                               ` <BLU0-SMTP98BFCC52FD41661DD9CC1E96D00@phx.gbl>
  1 sibling, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-14 22:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Rostedt, David Miller, peterz, will.newton, jbaron, mingo,
	tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	ddaney, michael, linux-kernel, vapier, cmetcalf, dhowells,
	schwidefsky, heiko.carstens, benh

* H. Peter Anvin (hpa@zytor.com) wrote:
> On 02/14/2011 02:20 PM, Steven Rostedt wrote:
> > 
> > Ideally, we would like a single atomic_read() but due to these wacky
> > archs, it may not be possible.
> > 
> 
> #ifdef ARCH_ATOMIC_READ_SUCKS_EGGS?
> 
> 	-hpa

lol :)

Hrm, I wonder if it might cause problems with combinations of "cmpxchg"
and "read" performed on a variable (without using atomic.h).

Mathieu

> 
> 
> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 22:20                                           ` Steven Rostedt
  2011-02-14 22:21                                             ` Steven Rostedt
  2011-02-14 22:21                                             ` H. Peter Anvin
@ 2011-02-14 22:33                                             ` David Miller
  2 siblings, 0 replies; 113+ messages in thread
From: David Miller @ 2011-02-14 22:33 UTC (permalink / raw)
  To: rostedt
  Cc: peterz, will.newton, jbaron, mathieu.desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	ddaney, michael, linux-kernel, vapier, cmetcalf, dhowells,
	schwidefsky, heiko.carstens, benh

From: Steven Rostedt <rostedt@goodmis.org>
Date: Mon, 14 Feb 2011 17:20:30 -0500

> On Mon, 2011-02-14 at 13:46 -0800, David Miller wrote:
>> From: Steven Rostedt <rostedt@goodmis.org>
>> Date: Mon, 14 Feb 2011 16:39:36 -0500
>> 
>> > Thus it is not about global, as global is updated by normal means and
>> > will update the caches. atomic_t is updated via the ll/sc that ignores
>> > the cache and causes all this to break down. IOW... broken hardware ;)
>> 
>> I don't see how cache coherency can possibly work if the hardware
>> behaves this way.
>> 
>> In cache aliasing situations, yes I can understand a L1 cache visibility
>> issue being present, but with kernel only stuff that should never happen
>> otherwise we have a bug in the arch cache flushing support.
> 
> I guess the issue is, if you use ll/sc on memory, you must always use
> ll/sc on that memory, otherwise any normal read won't read the proper
> cache.

That also makes no sense at all.

Any update to the L2 cache must be snooped by the L1 cache and cause
an update, otherwise nothing can work correctly.

So every object we use cmpxchg() on in the kernel cannot work on this
architecture?  Is that what you're saying?

If so, a lot of things we do will not work.
.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]                                               ` <BLU0-SMTP98BFCC52FD41661DD9CC1E96D00@phx.gbl>
@ 2011-02-14 22:33                                                 ` David Miller
  0 siblings, 0 replies; 113+ messages in thread
From: David Miller @ 2011-02-14 22:33 UTC (permalink / raw)
  To: mathieu.desnoyers
  Cc: hpa, rostedt, peterz, will.newton, jbaron, mingo, tglx, andi,
	roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam, ddaney,
	michael, linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Date: Mon, 14 Feb 2011 17:29:58 -0500

> * H. Peter Anvin (hpa@zytor.com) wrote:
>> On 02/14/2011 02:20 PM, Steven Rostedt wrote:
>> > 
>> > Ideally, we would like a single atomic_read() but due to these wacky
>> > archs, it may not be possible.
>> > 
>> 
>> #ifdef ARCH_ATOMIC_READ_SUCKS_EGGS?
>> 
>> 	-hpa
> 
> lol :)
> 
> Hrm, I wonder if it might cause problems with combinations of "cmpxchg"
> and "read" performed on a variable (without using atomic.h).

We do that everywhere, it has to work.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 21:46                                         ` David Miller
  2011-02-14 22:20                                           ` Steven Rostedt
@ 2011-02-14 22:37                                           ` Matt Fleming
  2011-02-14 23:03                                             ` Mathieu Desnoyers
                                                               ` (3 more replies)
  1 sibling, 4 replies; 113+ messages in thread
From: Matt Fleming @ 2011-02-14 22:37 UTC (permalink / raw)
  To: David Miller
  Cc: rostedt, peterz, will.newton, jbaron, mathieu.desnoyers, hpa,
	mingo, tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec,
	avi, sam, ddaney, michael, linux-kernel, vapier, cmetcalf,
	dhowells, schwidefsky, heiko.carstens, benh

On Mon, 14 Feb 2011 13:46:00 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Steven Rostedt <rostedt@goodmis.org>
> Date: Mon, 14 Feb 2011 16:39:36 -0500
> 
> > Thus it is not about global, as global is updated by normal means
> > and will update the caches. atomic_t is updated via the ll/sc that
> > ignores the cache and causes all this to break down. IOW... broken
> > hardware ;)
> 
> I don't see how cache coherency can possibly work if the hardware
> behaves this way.

Cache coherency is still maintained provided writes/reads both go
through the cache ;-)

The problem is that for read-modify-write operations the arbitration
logic that decides who "wins" and is allowed to actually perform the
write, assuming two or more CPUs are competing for a single memory
address, is not implemented in the cache controller, I think. I'm not a
hardware engineer and I never understood how the arbitration logic
worked but I'm guessing that's the reason that the ll/sc instructions
bypass the cache.

Which is why the atomic_t functions worked out really well for that
arch, such that any accesses to an atomic_t * had to go through the
wrapper functions.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 22:37                                           ` Matt Fleming
@ 2011-02-14 23:03                                             ` Mathieu Desnoyers
       [not found]                                             ` <BLU0-SMTP166A8555C791786059B0FF96D00@phx.gbl>
                                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-14 23:03 UTC (permalink / raw)
  To: Matt Fleming
  Cc: David Miller, rostedt, peterz, will.newton, jbaron, hpa, mingo,
	tglx, andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	ddaney, michael, linux-kernel, vapier, cmetcalf, dhowells,
	schwidefsky, heiko.carstens, benh, Paul E. McKenney

* Matt Fleming (matt@console-pimps.org) wrote:
> On Mon, 14 Feb 2011 13:46:00 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Steven Rostedt <rostedt@goodmis.org>
> > Date: Mon, 14 Feb 2011 16:39:36 -0500
> > 
> > > Thus it is not about global, as global is updated by normal means
> > > and will update the caches. atomic_t is updated via the ll/sc that
> > > ignores the cache and causes all this to break down. IOW... broken
> > > hardware ;)
> > 
> > I don't see how cache coherency can possibly work if the hardware
> > behaves this way.
> 
> Cache coherency is still maintained provided writes/reads both go
> through the cache ;-)
> 
> The problem is that for read-modify-write operations the arbitration
> logic that decides who "wins" and is allowed to actually perform the
> write, assuming two or more CPUs are competing for a single memory
> address, is not implemented in the cache controller, I think. I'm not a
> hardware engineer and I never understood how the arbitration logic
> worked but I'm guessing that's the reason that the ll/sc instructions
> bypass the cache.
> 
> Which is why the atomic_t functions worked out really well for that
> arch, such that any accesses to an atomic_t * had to go through the
> wrapper functions.

If this is true, then we have bugs in lots of xchg/cmpxchg users (which
do not reside in atomic.h), e.g.:

fs/fs_struct.c:
int current_umask(void)
{
        return current->fs->umask;
}
EXPORT_SYMBOL(current_umask);

kernel/sys.c:
SYSCALL_DEFINE1(umask, int, mask)
{
        mask = xchg(&current->fs->umask, mask & S_IRWXUGO);
        return mask;
}

The solution to this would be to force all xchg/cmpxchg users to swap to
atomic.h variables, which would force the ll semantic on read. But I'd
really like to see where this is documented first -- or which PowerPC
engineer we should talk to.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]                                             ` <BLU0-SMTP166A8555C791786059B0FF96D00@phx.gbl>
@ 2011-02-14 23:09                                               ` Paul E. McKenney
  2011-02-14 23:29                                                 ` Mathieu Desnoyers
                                                                   ` (3 more replies)
  0 siblings, 4 replies; 113+ messages in thread
From: Paul E. McKenney @ 2011-02-14 23:09 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Matt Fleming, David Miller, rostedt, peterz, will.newton, jbaron,
	hpa, mingo, tglx, andi, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh

On Mon, Feb 14, 2011 at 06:03:01PM -0500, Mathieu Desnoyers wrote:
> * Matt Fleming (matt@console-pimps.org) wrote:
> > On Mon, 14 Feb 2011 13:46:00 -0800 (PST)
> > David Miller <davem@davemloft.net> wrote:
> > 
> > > From: Steven Rostedt <rostedt@goodmis.org>
> > > Date: Mon, 14 Feb 2011 16:39:36 -0500
> > > 
> > > > Thus it is not about global, as global is updated by normal means
> > > > and will update the caches. atomic_t is updated via the ll/sc that
> > > > ignores the cache and causes all this to break down. IOW... broken
> > > > hardware ;)
> > > 
> > > I don't see how cache coherency can possibly work if the hardware
> > > behaves this way.
> > 
> > Cache coherency is still maintained provided writes/reads both go
> > through the cache ;-)
> > 
> > The problem is that for read-modify-write operations the arbitration
> > logic that decides who "wins" and is allowed to actually perform the
> > write, assuming two or more CPUs are competing for a single memory
> > address, is not implemented in the cache controller, I think. I'm not a
> > hardware engineer and I never understood how the arbitration logic
> > worked but I'm guessing that's the reason that the ll/sc instructions
> > bypass the cache.
> > 
> > Which is why the atomic_t functions worked out really well for that
> > arch, such that any accesses to an atomic_t * had to go through the
> > wrapper functions.

???

What CPU family are we talking about here?  For cache coherent CPUs,
cache coherence really is supposed to work, even for mixed atomic and
non-atomic instructions to the same variable.

							Thanx, Paul

> If this is true, then we have bugs in lots of xchg/cmpxchg users (which
> do not reside in atomic.h), e.g.:
> 
> fs/fs_struct.c:
> int current_umask(void)
> {
>         return current->fs->umask;
> }
> EXPORT_SYMBOL(current_umask);
> 
> kernel/sys.c:
> SYSCALL_DEFINE1(umask, int, mask)
> {
>         mask = xchg(&current->fs->umask, mask & S_IRWXUGO);
>         return mask;
> }
> 
> The solution to this would be to force all xchg/cmpxchg users to swap to
> atomic.h variables, which would force the ll semantic on read. But I'd
> really like to see where this is documented first -- or which PowerPC
> engineer we should talk to.
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 22:37                                           ` Matt Fleming
  2011-02-14 23:03                                             ` Mathieu Desnoyers
       [not found]                                             ` <BLU0-SMTP166A8555C791786059B0FF96D00@phx.gbl>
@ 2011-02-14 23:19                                             ` H. Peter Anvin
  2011-02-15 11:01                                               ` Will Newton
       [not found]                                             ` <BLU0-SMTP637B2E9372CFBF3A0B5B0996D00@phx.gbl>
  3 siblings, 1 reply; 113+ messages in thread
From: H. Peter Anvin @ 2011-02-14 23:19 UTC (permalink / raw)
  To: Matt Fleming
  Cc: David Miller, rostedt, peterz, will.newton, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

On 02/14/2011 02:37 PM, Matt Fleming wrote:
>>
>> I don't see how cache coherency can possibly work if the hardware
>> behaves this way.
> 
> Cache coherency is still maintained provided writes/reads both go
> through the cache ;-)
> 
> The problem is that for read-modify-write operations the arbitration
> logic that decides who "wins" and is allowed to actually perform the
> write, assuming two or more CPUs are competing for a single memory
> address, is not implemented in the cache controller, I think. I'm not a
> hardware engineer and I never understood how the arbitration logic
> worked but I'm guessing that's the reason that the ll/sc instructions
> bypass the cache.
> 
> Which is why the atomic_t functions worked out really well for that
> arch, such that any accesses to an atomic_t * had to go through the
> wrapper functions.

I'm sorry... this doesn't compute.  Either reads can work normally (go
through the cache) in which case atomic_read() can simply be a read or
they don't, so I don't understand this at all.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]                                             ` <BLU0-SMTP637B2E9372CFBF3A0B5B0996D00@phx.gbl>
@ 2011-02-14 23:25                                               ` David Miller
  2011-02-14 23:34                                                 ` Mathieu Desnoyers
       [not found]                                                 ` <20110214233405.GC17432@Krystal>
  0 siblings, 2 replies; 113+ messages in thread
From: David Miller @ 2011-02-14 23:25 UTC (permalink / raw)
  To: mathieu.desnoyers
  Cc: matt, rostedt, peterz, will.newton, jbaron, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	ddaney, michael, linux-kernel, vapier, cmetcalf, dhowells,
	schwidefsky, heiko.carstens, benh, paulmck

From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Date: Mon, 14 Feb 2011 18:03:01 -0500

> If this is true, then we have bugs in lots of xchg/cmpxchg users (which
> do not reside in atomic.h), e.g.:
> 
> fs/fs_struct.c:
> int current_umask(void)
> {
>         return current->fs->umask;
> }
> EXPORT_SYMBOL(current_umask);
> 
> kernel/sys.c:
> SYSCALL_DEFINE1(umask, int, mask)
> {
>         mask = xchg(&current->fs->umask, mask & S_IRWXUGO);
>         return mask;
> }
> 
> The solution to this would be to force all xchg/cmpxchg users to swap to
> atomic.h variables, which would force the ll semantic on read. But I'd
> really like to see where this is documented first -- or which PowerPC
> engineer we should talk to.

We can't wholesale to atomic_t because we do this on variables of
all sizes, not just 32-bit ones.

We do them on pointers in the networking for example.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 23:09                                               ` Paul E. McKenney
@ 2011-02-14 23:29                                                 ` Mathieu Desnoyers
       [not found]                                                 ` <BLU0-SMTP4599FAAD7330498472B87396D00@phx.gbl>
                                                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-14 23:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Matt Fleming, David Miller, rostedt, peterz, will.newton, jbaron,
	hpa, mingo, tglx, andi, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh,
	Segher Boessenkool, Paul Mackerras

[ added Segher Boessenkool and Paul Mackerras to CC list ]

* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> On Mon, Feb 14, 2011 at 06:03:01PM -0500, Mathieu Desnoyers wrote:
> > * Matt Fleming (matt@console-pimps.org) wrote:
> > > On Mon, 14 Feb 2011 13:46:00 -0800 (PST)
> > > David Miller <davem@davemloft.net> wrote:
> > > 
> > > > From: Steven Rostedt <rostedt@goodmis.org>
> > > > Date: Mon, 14 Feb 2011 16:39:36 -0500
> > > > 
> > > > > Thus it is not about global, as global is updated by normal means
> > > > > and will update the caches. atomic_t is updated via the ll/sc that
> > > > > ignores the cache and causes all this to break down. IOW... broken
> > > > > hardware ;)
> > > > 
> > > > I don't see how cache coherency can possibly work if the hardware
> > > > behaves this way.
> > > 
> > > Cache coherency is still maintained provided writes/reads both go
> > > through the cache ;-)
> > > 
> > > The problem is that for read-modify-write operations the arbitration
> > > logic that decides who "wins" and is allowed to actually perform the
> > > write, assuming two or more CPUs are competing for a single memory
> > > address, is not implemented in the cache controller, I think. I'm not a
> > > hardware engineer and I never understood how the arbitration logic
> > > worked but I'm guessing that's the reason that the ll/sc instructions
> > > bypass the cache.
> > > 
> > > Which is why the atomic_t functions worked out really well for that
> > > arch, such that any accesses to an atomic_t * had to go through the
> > > wrapper functions.
> 
> ???
> 
> What CPU family are we talking about here?  For cache coherent CPUs,
> cache coherence really is supposed to work, even for mixed atomic and
> non-atomic instructions to the same variable.
> 

I'm really curious to know which CPU families too. I've used git blame
to see where these lwz/stw instructions were added to powerpc, and it
points to:

commit 9f0cbea0d8cc47801b853d3c61d0e17475b0cc89
Author: Segher Boessenkool <segher@kernel.crashing.org>
Date:   Sat Aug 11 10:15:30 2007 +1000

    [POWERPC] Implement atomic{, 64}_{read, write}() without volatile
    
    Instead, use asm() like all other atomic operations already do.
    
    Also use inline functions instead of macros; this actually
    improves code generation (some code becomes a little smaller,
    probably because of improved alias information -- just a few
    hundred bytes total on a default kernel build, nothing shocking).
    
    Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
    Signed-off-by: Paul Mackerras <paulus@samba.org>

So let's ping the relevant people to see if there was any reason for
making these atomic read/set operations different from other
architectures in the first place.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 23:25                                               ` David Miller
@ 2011-02-14 23:34                                                 ` Mathieu Desnoyers
       [not found]                                                 ` <20110214233405.GC17432@Krystal>
  1 sibling, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-14 23:34 UTC (permalink / raw)
  To: David Miller
  Cc: matt, rostedt, peterz, will.newton, jbaron, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	ddaney, michael, linux-kernel, vapier, cmetcalf, dhowells,
	schwidefsky, heiko.carstens, benh, paulmck

* David Miller (davem@davemloft.net) wrote:
> From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date: Mon, 14 Feb 2011 18:03:01 -0500
> 
> > If this is true, then we have bugs in lots of xchg/cmpxchg users (which
> > do not reside in atomic.h), e.g.:
> > 
> > fs/fs_struct.c:
> > int current_umask(void)
> > {
> >         return current->fs->umask;
> > }
> > EXPORT_SYMBOL(current_umask);
> > 
> > kernel/sys.c:
> > SYSCALL_DEFINE1(umask, int, mask)
> > {
> >         mask = xchg(&current->fs->umask, mask & S_IRWXUGO);
> >         return mask;
> > }
> > 
> > The solution to this would be to force all xchg/cmpxchg users to swap to
> > atomic.h variables, which would force the ll semantic on read. But I'd
> > really like to see where this is documented first -- or which PowerPC
> > engineer we should talk to.
> 
> We can't wholesale to atomic_t because we do this on variables of
> all sizes, not just 32-bit ones.
> 
> We do them on pointers in the networking for example.

We have atomic_long_t for this, but yeah, it would kind of suck to have
to create

union {
	atomic_long_t atomic;
	void *ptr;
}

all around the place. Let's see if we can get to know which PowerPC
processor family all this fuss is about, and where this rumour
originates from.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]                                                 ` <20110214233405.GC17432@Krystal>
@ 2011-02-14 23:52                                                   ` Mathieu Desnoyers
  0 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-14 23:52 UTC (permalink / raw)
  To: David Miller
  Cc: matt, rostedt, peterz, will.newton, jbaron, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam,
	ddaney, michael, linux-kernel, vapier, cmetcalf, dhowells,
	schwidefsky, heiko.carstens, benh, paulmck

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * David Miller (davem@davemloft.net) wrote:
[...]
> > We can't wholesale to atomic_t because we do this on variables of
> > all sizes, not just 32-bit ones.
> > 
> > We do them on pointers in the networking for example.
> 
> We have atomic_long_t for this, but yeah, it would kind of suck to have
> to create
> 
> union {
> 	atomic_long_t atomic;
> 	void *ptr;
> }

Actually, using a union for this is probably one of the worse idea I've
had recently. Just casting the pointer to unsigned long and vice-versa,
using atomic_long_*() ops would do the trick. But let's wait and see if
it's really needed.

Thanks,

Mathieu

> 
> all around the place. Let's see if we can get to know which PowerPC
> processor family all this fuss is about, and where this rumour
> originates from.
> 
> Thanks,
> 
> Mathieu
> 
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]                                                 ` <BLU0-SMTP4599FAAD7330498472B87396D00@phx.gbl>
@ 2011-02-15  0:19                                                   ` Segher Boessenkool
  2011-02-15  0:48                                                     ` Mathieu Desnoyers
  2011-02-15  1:29                                                     ` Steven Rostedt
  0 siblings, 2 replies; 113+ messages in thread
From: Segher Boessenkool @ 2011-02-15  0:19 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, Matt Fleming, David Miller, rostedt, peterz,
	will.newton, jbaron, hpa, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh, Segher Boessenkool, Paul Mackerras

>> What CPU family are we talking about here?  For cache coherent CPUs,
>> cache coherence really is supposed to work, even for mixed atomic and
>> non-atomic instructions to the same variable.
>
> I'm really curious to know which CPU families too. I've used git blame
> to see where these lwz/stw instructions were added to powerpc, and it
> points to:
>
> commit 9f0cbea0d8cc47801b853d3c61d0e17475b0cc89

> So let's ping the relevant people to see if there was any reason for
> making these atomic read/set operations different from other
> architectures in the first place.

lwz is a simple 32-bit load.  On PowerPC, such a load is guaranteed
to be atomic (except some unaligned cases).  stw is similar, for stores.
These are the normal insns, not ll/sc or anything.

At the time, volatile tricks were used to make the accesses atomic; this
patch changed that.  Result is (or should be!) better code generation.

Is there a problem with it?


Segher


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]                                                 ` <BLU0-SMTP984E876DBDFBC13F4C86F896D00@phx.gbl>
@ 2011-02-15  0:42                                                   ` Paul E. McKenney
  2011-02-15  0:51                                                     ` Mathieu Desnoyers
  0 siblings, 1 reply; 113+ messages in thread
From: Paul E. McKenney @ 2011-02-15  0:42 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Matt Fleming, David Miller, rostedt, peterz, will.newton, jbaron,
	hpa, mingo, tglx, andi, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh,
	Segher Boessenkool, Paul Mackerras

On Mon, Feb 14, 2011 at 06:29:47PM -0500, Mathieu Desnoyers wrote:
> [ added Segher Boessenkool and Paul Mackerras to CC list ]
> 
> * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> > On Mon, Feb 14, 2011 at 06:03:01PM -0500, Mathieu Desnoyers wrote:
> > > * Matt Fleming (matt@console-pimps.org) wrote:
> > > > On Mon, 14 Feb 2011 13:46:00 -0800 (PST)
> > > > David Miller <davem@davemloft.net> wrote:
> > > > 
> > > > > From: Steven Rostedt <rostedt@goodmis.org>
> > > > > Date: Mon, 14 Feb 2011 16:39:36 -0500
> > > > > 
> > > > > > Thus it is not about global, as global is updated by normal means
> > > > > > and will update the caches. atomic_t is updated via the ll/sc that
> > > > > > ignores the cache and causes all this to break down. IOW... broken
> > > > > > hardware ;)
> > > > > 
> > > > > I don't see how cache coherency can possibly work if the hardware
> > > > > behaves this way.
> > > > 
> > > > Cache coherency is still maintained provided writes/reads both go
> > > > through the cache ;-)
> > > > 
> > > > The problem is that for read-modify-write operations the arbitration
> > > > logic that decides who "wins" and is allowed to actually perform the
> > > > write, assuming two or more CPUs are competing for a single memory
> > > > address, is not implemented in the cache controller, I think. I'm not a
> > > > hardware engineer and I never understood how the arbitration logic
> > > > worked but I'm guessing that's the reason that the ll/sc instructions
> > > > bypass the cache.
> > > > 
> > > > Which is why the atomic_t functions worked out really well for that
> > > > arch, such that any accesses to an atomic_t * had to go through the
> > > > wrapper functions.
> > 
> > ???
> > 
> > What CPU family are we talking about here?  For cache coherent CPUs,
> > cache coherence really is supposed to work, even for mixed atomic and
> > non-atomic instructions to the same variable.
> > 
> 
> I'm really curious to know which CPU families too. I've used git blame
> to see where these lwz/stw instructions were added to powerpc, and it
> points to:

But lwz and stw instructions are normal non-atomic PowerPC loads and
stores.  No LL/SC -- those would instead be lwarx and stwcx.

							Thanx, Paul

> commit 9f0cbea0d8cc47801b853d3c61d0e17475b0cc89
> Author: Segher Boessenkool <segher@kernel.crashing.org>
> Date:   Sat Aug 11 10:15:30 2007 +1000
> 
>     [POWERPC] Implement atomic{, 64}_{read, write}() without volatile
>     
>     Instead, use asm() like all other atomic operations already do.
>     
>     Also use inline functions instead of macros; this actually
>     improves code generation (some code becomes a little smaller,
>     probably because of improved alias information -- just a few
>     hundred bytes total on a default kernel build, nothing shocking).
>     
>     Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
>     Signed-off-by: Paul Mackerras <paulus@samba.org>
> 
> So let's ping the relevant people to see if there was any reason for
> making these atomic read/set operations different from other
> architectures in the first place.
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15  0:19                                                   ` Segher Boessenkool
@ 2011-02-15  0:48                                                     ` Mathieu Desnoyers
  2011-02-15  1:29                                                     ` Steven Rostedt
  1 sibling, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-15  0:48 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Paul E. McKenney, Matt Fleming, David Miller, rostedt, peterz,
	will.newton, jbaron, hpa, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh, Paul Mackerras

* Segher Boessenkool (segher@kernel.crashing.org) wrote:
> >> What CPU family are we talking about here?  For cache coherent CPUs,
> >> cache coherence really is supposed to work, even for mixed atomic and
> >> non-atomic instructions to the same variable.
> >
> > I'm really curious to know which CPU families too. I've used git blame
> > to see where these lwz/stw instructions were added to powerpc, and it
> > points to:
> >
> > commit 9f0cbea0d8cc47801b853d3c61d0e17475b0cc89
> 
> > So let's ping the relevant people to see if there was any reason for
> > making these atomic read/set operations different from other
> > architectures in the first place.
> 
> lwz is a simple 32-bit load.  On PowerPC, such a load is guaranteed
> to be atomic (except some unaligned cases).  stw is similar, for stores.
> These are the normal insns, not ll/sc or anything.
> 
> At the time, volatile tricks were used to make the accesses atomic; this
> patch changed that.  Result is (or should be!) better code generation.
> 
> Is there a problem with it?

It seems fine then. It seems to be my confusion to think that Matt
referred to PowerPC in his statement. It's probably an unrelated
architecture.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15  0:42                                                   ` Paul E. McKenney
@ 2011-02-15  0:51                                                     ` Mathieu Desnoyers
  0 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-15  0:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Matt Fleming, David Miller, rostedt, peterz, will.newton, jbaron,
	hpa, mingo, tglx, andi, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh,
	Segher Boessenkool, Paul Mackerras

* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> On Mon, Feb 14, 2011 at 06:29:47PM -0500, Mathieu Desnoyers wrote:
> > [ added Segher Boessenkool and Paul Mackerras to CC list ]
> > 
> > * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> > > On Mon, Feb 14, 2011 at 06:03:01PM -0500, Mathieu Desnoyers wrote:
> > > > * Matt Fleming (matt@console-pimps.org) wrote:
> > > > > On Mon, 14 Feb 2011 13:46:00 -0800 (PST)
> > > > > David Miller <davem@davemloft.net> wrote:
> > > > > 
> > > > > > From: Steven Rostedt <rostedt@goodmis.org>
> > > > > > Date: Mon, 14 Feb 2011 16:39:36 -0500
> > > > > > 
> > > > > > > Thus it is not about global, as global is updated by normal means
> > > > > > > and will update the caches. atomic_t is updated via the ll/sc that
> > > > > > > ignores the cache and causes all this to break down. IOW... broken
> > > > > > > hardware ;)
> > > > > > 
> > > > > > I don't see how cache coherency can possibly work if the hardware
> > > > > > behaves this way.
> > > > > 
> > > > > Cache coherency is still maintained provided writes/reads both go
> > > > > through the cache ;-)
> > > > > 
> > > > > The problem is that for read-modify-write operations the arbitration
> > > > > logic that decides who "wins" and is allowed to actually perform the
> > > > > write, assuming two or more CPUs are competing for a single memory
> > > > > address, is not implemented in the cache controller, I think. I'm not a
> > > > > hardware engineer and I never understood how the arbitration logic
> > > > > worked but I'm guessing that's the reason that the ll/sc instructions
> > > > > bypass the cache.
> > > > > 
> > > > > Which is why the atomic_t functions worked out really well for that
> > > > > arch, such that any accesses to an atomic_t * had to go through the
> > > > > wrapper functions.
> > > 
> > > ???
> > > 
> > > What CPU family are we talking about here?  For cache coherent CPUs,
> > > cache coherence really is supposed to work, even for mixed atomic and
> > > non-atomic instructions to the same variable.
> > > 
> > 
> > I'm really curious to know which CPU families too. I've used git blame
> > to see where these lwz/stw instructions were added to powerpc, and it
> > points to:
> 
> But lwz and stw instructions are normal non-atomic PowerPC loads and
> stores.  No LL/SC -- those would instead be lwarx and stwcx.

Ah, right. Color me confused ;) I think Matt was talking about a secret
"out of tree" architecture. It sure feels like a James Bond movie. :)

Thanks,

Mathieu

> 
> 							Thanx, Paul
> 
> > commit 9f0cbea0d8cc47801b853d3c61d0e17475b0cc89
> > Author: Segher Boessenkool <segher@kernel.crashing.org>
> > Date:   Sat Aug 11 10:15:30 2007 +1000
> > 
> >     [POWERPC] Implement atomic{, 64}_{read, write}() without volatile
> >     
> >     Instead, use asm() like all other atomic operations already do.
> >     
> >     Also use inline functions instead of macros; this actually
> >     improves code generation (some code becomes a little smaller,
> >     probably because of improved alias information -- just a few
> >     hundred bytes total on a default kernel build, nothing shocking).
> >     
> >     Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
> >     Signed-off-by: Paul Mackerras <paulus@samba.org>
> > 
> > So let's ping the relevant people to see if there was any reason for
> > making these atomic read/set operations different from other
> > architectures in the first place.
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> > -- 
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15  0:19                                                   ` Segher Boessenkool
  2011-02-15  0:48                                                     ` Mathieu Desnoyers
@ 2011-02-15  1:29                                                     ` Steven Rostedt
  1 sibling, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-02-15  1:29 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Mathieu Desnoyers, Paul E. McKenney, Matt Fleming, David Miller,
	peterz, will.newton, jbaron, hpa, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh, Paul Mackerras

On Tue, 2011-02-15 at 01:19 +0100, Segher Boessenkool wrote:
> >> What CPU family are we talking about here?  For cache coherent CPUs,
> >> cache coherence really is supposed to work, even for mixed atomic and
> >> non-atomic instructions to the same variable.
> >
> > I'm really curious to know which CPU families too. I've used git blame
> > to see where these lwz/stw instructions were added to powerpc, and it
> > points to:
> >
> > commit 9f0cbea0d8cc47801b853d3c61d0e17475b0cc89
> 
> > So let's ping the relevant people to see if there was any reason for
> > making these atomic read/set operations different from other
> > architectures in the first place.
> 
> lwz is a simple 32-bit load.  On PowerPC, such a load is guaranteed
> to be atomic (except some unaligned cases).  stw is similar, for stores.
> These are the normal insns, not ll/sc or anything.
> 
> At the time, volatile tricks were used to make the accesses atomic; this
> patch changed that.  Result is (or should be!) better code generation.
> 
> Is there a problem with it?

I guess Mathieu was just getting confused.

But we are looking at seeing if we can make atomic_read() a generic
function instead of defining it for all archs. Just something that we
could do to fix the include header hell when a static inline contains
atomic_read() and happens to be included by kernel.h. Then we have
atomic.h needing to include kernel.h which needs to include atomic.h
first and so on.

Although, it may be just best if we can do some #ifdef magic to prevent
all this mess anyway.

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 23:19                                             ` H. Peter Anvin
@ 2011-02-15 11:01                                               ` Will Newton
  2011-02-15 13:31                                                 ` H. Peter Anvin
  2011-02-15 21:11                                                 ` Will Simoneau
  0 siblings, 2 replies; 113+ messages in thread
From: Will Newton @ 2011-02-15 11:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Matt Fleming, David Miller, rostedt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

On Mon, Feb 14, 2011 at 11:19 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 02/14/2011 02:37 PM, Matt Fleming wrote:
>>>
>>> I don't see how cache coherency can possibly work if the hardware
>>> behaves this way.
>>
>> Cache coherency is still maintained provided writes/reads both go
>> through the cache ;-)
>>
>> The problem is that for read-modify-write operations the arbitration
>> logic that decides who "wins" and is allowed to actually perform the
>> write, assuming two or more CPUs are competing for a single memory
>> address, is not implemented in the cache controller, I think. I'm not a
>> hardware engineer and I never understood how the arbitration logic
>> worked but I'm guessing that's the reason that the ll/sc instructions
>> bypass the cache.
>>
>> Which is why the atomic_t functions worked out really well for that
>> arch, such that any accesses to an atomic_t * had to go through the
>> wrapper functions.
>
> I'm sorry... this doesn't compute.  Either reads can work normally (go
> through the cache) in which case atomic_read() can simply be a read or
> they don't, so I don't understand this at all.

The CPU in question has two sets of instructions:

  load/store - these go via the cache (write through)
  ll/sc - these operate literally as if there is no cache (they do not
hit on read or write)

This may or may not be a sensible way to architect a CPU, but I think
it is possible to make it work. Making it work efficiently is more of
a challenge.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 23:09                                               ` Paul E. McKenney
                                                                   ` (2 preceding siblings ...)
       [not found]                                                 ` <BLU0-SMTP984E876DBDFBC13F4C86F896D00@phx.gbl>
@ 2011-02-15 11:53                                                 ` Will Newton
  2011-02-18 19:03                                                   ` Paul E. McKenney
  3 siblings, 1 reply; 113+ messages in thread
From: Will Newton @ 2011-02-15 11:53 UTC (permalink / raw)
  To: paulmck
  Cc: Mathieu Desnoyers, Matt Fleming, David Miller, rostedt, peterz,
	jbaron, hpa, mingo, tglx, andi, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh

On Mon, Feb 14, 2011 at 11:09 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:

Hi Paul,

> What CPU family are we talking about here?  For cache coherent CPUs,
> cache coherence really is supposed to work, even for mixed atomic and
> non-atomic instructions to the same variable.

Is there a specific situation you can think of where this would be a
problem? I have to admit to a certain amount of unease with the design
our hardware guys came up with, but I don't have a specific case where
it won't work, just cases where it is less than optimal.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15 11:01                                               ` Will Newton
@ 2011-02-15 13:31                                                 ` H. Peter Anvin
  2011-02-15 13:49                                                   ` Steven Rostedt
  2011-02-15 14:04                                                   ` Will Newton
  2011-02-15 21:11                                                 ` Will Simoneau
  1 sibling, 2 replies; 113+ messages in thread
From: H. Peter Anvin @ 2011-02-15 13:31 UTC (permalink / raw)
  To: Will Newton
  Cc: Matt Fleming, David Miller, rostedt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

On 02/15/2011 03:01 AM, Will Newton wrote:
> 
> The CPU in question has two sets of instructions:
> 
>   load/store - these go via the cache (write through)
>   ll/sc - these operate literally as if there is no cache (they do not
> hit on read or write)
> 
> This may or may not be a sensible way to architect a CPU, but I think
> it is possible to make it work. Making it work efficiently is more of
> a challenge.
> 

a) What "CPU in question" is this?
b) Why should we let this particular insane CPU slow ALL OTHER CPUs down?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15 13:31                                                 ` H. Peter Anvin
@ 2011-02-15 13:49                                                   ` Steven Rostedt
  2011-02-15 14:04                                                   ` Will Newton
  1 sibling, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-02-15 13:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Will Newton, Matt Fleming, David Miller, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

On Tue, 2011-02-15 at 05:31 -0800, H. Peter Anvin wrote:
> On 02/15/2011 03:01 AM, Will Newton wrote:

> b) Why should we let this particular insane CPU slow ALL OTHER CPUs down?

Yesterday I got around to reading Linus's interview here:

http://www.itwire.com/opinion-and-analysis/open-sauce/44975-linus-torvalds-looking-back-looking-forward?start=4

This seems appropriate:

"When it comes to "feature I had to include for reasons beyond my
control", it tends to be about crazy hardware doing stupid things that
we just have to work around. Most of the time that's limited to some
specific driver or other, and it's not something that has any relevance
in the "big picture", or that really affects core kernel design very
much. But sometimes it does, and then I really detest it."

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15 13:31                                                 ` H. Peter Anvin
  2011-02-15 13:49                                                   ` Steven Rostedt
@ 2011-02-15 14:04                                                   ` Will Newton
  1 sibling, 0 replies; 113+ messages in thread
From: Will Newton @ 2011-02-15 14:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Matt Fleming, David Miller, rostedt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

On Tue, Feb 15, 2011 at 1:31 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 02/15/2011 03:01 AM, Will Newton wrote:
>>
>> The CPU in question has two sets of instructions:
>>
>>   load/store - these go via the cache (write through)
>>   ll/sc - these operate literally as if there is no cache (they do not
>> hit on read or write)
>>
>> This may or may not be a sensible way to architect a CPU, but I think
>> it is possible to make it work. Making it work efficiently is more of
>> a challenge.
>>
>
> a) What "CPU in question" is this?

http://imgtec.com/meta/meta-technology.asp

> b) Why should we let this particular insane CPU slow ALL OTHER CPUs down?

I didn't propose we do that. I brought it up just to make people aware
that there are these odd architectures out there, and indeed it turns
out Blackfin has some superficially similar issues.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-14 17:27                           ` Peter Zijlstra
  2011-02-14 17:29                             ` Mike Frysinger
  2011-02-14 17:38                             ` Will Newton
@ 2011-02-15 15:20                             ` Heiko Carstens
  2 siblings, 0 replies; 113+ messages in thread
From: Heiko Carstens @ 2011-02-15 15:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Jason Baron, Mathieu Desnoyers, hpa, mingo, tglx,
	andi, roland, rth, masami.hiramatsu.pt, fweisbec, avi, davem,
	sam, ddaney, michael, linux-kernel, Mike Frysinger,
	Chris Metcalf, dhowells, Martin Schwidefsky, benh

On Mon, Feb 14, 2011 at 06:27:27PM +0100, Peter Zijlstra wrote:
> On Mon, 2011-02-14 at 12:18 -0500, Steven Rostedt wrote:
> 
> > mn10300:
> > #define atomic_read(v)  ((v)->counter)
> 
> > tile:
> > static inline int atomic_read(const atomic_t *v)
> > {
> >        return v->counter;
> > }
> 
> Yeah, I already send email to the respective maintainers telling them
> they might want to fix this ;-)
> 
> 
> > So all but a few have basically (as you said on IRC)
> > #define atomic_read(v) ACCESS_ONCE(v)
> 
> ACCESS_ONCE(v->counter), but yeah :-)
> 
> > Those few are blackfin, s390, powerpc and tile.
> > 
> > s390 probably doesn't need that much of a big hammer with atomic_read()
> > (unless it uses it in its own arch that expects it to be such).
> 
> Right, it could just do the volatile thing..

The reason that the code on s390 looks like it is was that the volatile
cast was known to generate really bad code.
However I just tried a few variants (inline asm / ACCESS_ONCE / leave as is)
with gcc 4.5.2 and the resulting code was always identical.
So I'm going to change it to the ACCESS_ONCE variant so it's the same like
everywhere else.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15 11:01                                               ` Will Newton
  2011-02-15 13:31                                                 ` H. Peter Anvin
@ 2011-02-15 21:11                                                 ` Will Simoneau
  2011-02-15 21:27                                                   ` David Miller
  1 sibling, 1 reply; 113+ messages in thread
From: Will Simoneau @ 2011-02-15 21:11 UTC (permalink / raw)
  To: Will Newton
  Cc: H. Peter Anvin, Matt Fleming, David Miller, rostedt, peterz,
	jbaron, mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

[-- Attachment #1: Type: text/plain, Size: 4741 bytes --]

On 11:01 Tue 17 Feb     , Will Newton wrote:
> On Mon, Feb 14, 2011 at 11:19 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> > On 02/14/2011 02:37 PM, Matt Fleming wrote:
> >>>
> >>> I don't see how cache coherency can possibly work if the hardware
> >>> behaves this way.
> >>
> >> Cache coherency is still maintained provided writes/reads both go
> >> through the cache ;-)
> >>
> >> The problem is that for read-modify-write operations the arbitration
> >> logic that decides who "wins" and is allowed to actually perform the
> >> write, assuming two or more CPUs are competing for a single memory
> >> address, is not implemented in the cache controller, I think. I'm not a
> >> hardware engineer and I never understood how the arbitration logic
> >> worked but I'm guessing that's the reason that the ll/sc instructions
> >> bypass the cache.
> >>
> >> Which is why the atomic_t functions worked out really well for that
> >> arch, such that any accesses to an atomic_t * had to go through the
> >> wrapper functions.
> >
> > I'm sorry... this doesn't compute. ?Either reads can work normally (go
> > through the cache) in which case atomic_read() can simply be a read or
> > they don't, so I don't understand this at all.
> 
> The CPU in question has two sets of instructions:
> 
>   load/store - these go via the cache (write through)
>   ll/sc - these operate literally as if there is no cache (they do not
> hit on read or write)
> 
> This may or may not be a sensible way to architect a CPU, but I think
> it is possible to make it work. Making it work efficiently is more of
> a challenge.

Speaking as a (non-commercial) processor designer here, but feel free to point
out anything I'm wrong on. I have direct experience implementing these
operations in hardware so I'd hope what I say here is right. This information
is definitely relevant to the MIPS R4000 as well as my own hardware. A quick
look at the PPC documentation seems to indicate it's the same there too, and it
should agree with the Wikipedia article on the subject:
http://en.wikipedia.org/wiki/Load-link/store-conditional

The entire point of implementing load-linked (ll) / store-conditional (sc)
instructions is to have lockless atomic primitives *using the cache*. Proper
implementations do not bypass the cache; in fact, the cache coherence protocol
must get involved for them to be correct. If properly implemented, these
operations cause no external bus traffic if the critical section is uncontended
and hits the cache (good for scalability). These are the semantics:

ll: Essentially the same as a normal word load. Implementations will need to do
a little internal book-keeping (i.e. save physical address of last ll
instruction and/or modify coherence state for the cacheline).

sc: Store a word if and only if the address has not been written by any other
processor since the last ll. If the store fails, write 0 to a register,
otherwise write 1.

The address may be tracked on cacheline granularity; this operation may
spuriously fail, depending on implementation details (called "weak" ll/sc).
Arguably the "obvious" way to implement this is to have sc fail if the local
CPU snoops a read-for-ownership for the address in question coming from a
remote CPU. This works because the remote CPU will need to gain the cacheline
for exclusive access before its competing sc can execute. Code is supposed to
put ll/sc in a loop and simply retry the operation until the sc succeeds.

Note how the cache and cache coherence protocol are fundamental parts of this
operation; if these instructions simply bypassed the cache, they *could not*
work correctly - how do you detect when the underlying memory has been
modified? You can't simply detect whether the value has changed - it may have
been changed to another value and then back ("ABA" problem). You have to snoop
bus transactions, and that is what the cache and its coherence algorithm
already do. ll/sc can be implemented entirely using the side-effects of the
cache coherence algorithm; my own working hardware implementation does this.

So, atomically reading the variable can be accomplished with a normal load
instruction. I can't speak for unaligned loads on implementations that do them
in hardware, but at least an aligned load of word size should be atomic on any
sane architecture. Only an atomic read-modify-write of the variable needs to
use ll/sc at all, and only for the reason of preventing another concurrent
modification between the load and store. A plain aligned word store should be
atomic too, but it's not too useful because a another concurrent store would
not be ordered relative to the local store.

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15 21:11                                                 ` Will Simoneau
@ 2011-02-15 21:27                                                   ` David Miller
  2011-02-15 21:56                                                     ` Will Simoneau
  2011-02-15 22:20                                                     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 113+ messages in thread
From: David Miller @ 2011-02-15 21:27 UTC (permalink / raw)
  To: simoneau
  Cc: will.newton, hpa, matt, rostedt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

From: Will Simoneau <simoneau@ele.uri.edu>
Date: Tue, 15 Feb 2011 16:11:23 -0500

> Note how the cache and cache coherence protocol are fundamental parts of this
> operation; if these instructions simply bypassed the cache, they *could not*
> work correctly - how do you detect when the underlying memory has been
> modified?

The issue here isn't L2 cache bypassing, it's local L1 cache bypassing.

The chips in question aparently do not consult the local L1 cache on
"ll" instructions.

Therefore you must only ever access such atomic data using "ll"
instructions.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15 21:27                                                   ` David Miller
@ 2011-02-15 21:56                                                     ` Will Simoneau
  2011-02-16 10:15                                                       ` Will Newton
  2011-02-15 22:20                                                     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 113+ messages in thread
From: Will Simoneau @ 2011-02-15 21:56 UTC (permalink / raw)
  To: David Miller
  Cc: will.newton, hpa, matt, rostedt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

[-- Attachment #1: Type: text/plain, Size: 1234 bytes --]

On 13:27 Tue 15 Feb     , David Miller wrote:
> From: Will Simoneau <simoneau@ele.uri.edu>
> Date: Tue, 15 Feb 2011 16:11:23 -0500
> 
> > Note how the cache and cache coherence protocol are fundamental parts of this
> > operation; if these instructions simply bypassed the cache, they *could not*
> > work correctly - how do you detect when the underlying memory has been
> > modified?
> 
> The issue here isn't L2 cache bypassing, it's local L1 cache bypassing.
> 
> The chips in question aparently do not consult the local L1 cache on
> "ll" instructions.
> 
> Therefore you must only ever access such atomic data using "ll"
> instructions.

(I should not have said "underlying memory", since it is correct that
only the L1 caches are the problem here)

That's some really crippled hardware... it does seem like *any* loads
from *any* address updated by an sc would have to be done with ll as
well, else they may load stale values. One could work this into
atomic_read(), but surely there are other places that are problems.

It would be OK if the caches on the hardware in question were to
back-invalidate matching cachelines when the sc is snooped from the bus,
but it sounds like this doesn't happen?

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15 21:27                                                   ` David Miller
  2011-02-15 21:56                                                     ` Will Simoneau
@ 2011-02-15 22:20                                                     ` Benjamin Herrenschmidt
  2011-02-16  8:35                                                       ` Ingo Molnar
  1 sibling, 1 reply; 113+ messages in thread
From: Benjamin Herrenschmidt @ 2011-02-15 22:20 UTC (permalink / raw)
  To: David Miller
  Cc: simoneau, will.newton, hpa, matt, rostedt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens

On Tue, 2011-02-15 at 13:27 -0800, David Miller wrote:
> From: Will Simoneau <simoneau@ele.uri.edu>
> Date: Tue, 15 Feb 2011 16:11:23 -0500
> 
> > Note how the cache and cache coherence protocol are fundamental parts of this
> > operation; if these instructions simply bypassed the cache, they *could not*
> > work correctly - how do you detect when the underlying memory has been
> > modified?
> 
> The issue here isn't L2 cache bypassing, it's local L1 cache bypassing.
> 
> The chips in question aparently do not consult the local L1 cache on
> "ll" instructions.
> 
> Therefore you must only ever access such atomic data using "ll"
> instructions.

Note that it's actually a reasonable design choice to not consult the L1
in these case .... as long as you invalidate it on the way. That's how
current powerpcs do it afaik, they send a kill to any matching L1 line
along as reading from the L2. (Of course, L1 has to be write-through for
that to work).

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15 22:20                                                     ` Benjamin Herrenschmidt
@ 2011-02-16  8:35                                                       ` Ingo Molnar
  2011-02-17  1:04                                                         ` H. Peter Anvin
  0 siblings, 1 reply; 113+ messages in thread
From: Ingo Molnar @ 2011-02-16  8:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Miller, simoneau, will.newton, hpa, matt, rostedt, peterz,
	jbaron, mathieu.desnoyers, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens


* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Tue, 2011-02-15 at 13:27 -0800, David Miller wrote:
> > From: Will Simoneau <simoneau@ele.uri.edu>
> > Date: Tue, 15 Feb 2011 16:11:23 -0500
> > 
> > > Note how the cache and cache coherence protocol are fundamental parts of this
> > > operation; if these instructions simply bypassed the cache, they *could not*
> > > work correctly - how do you detect when the underlying memory has been
> > > modified?
> > 
> > The issue here isn't L2 cache bypassing, it's local L1 cache bypassing.
> > 
> > The chips in question aparently do not consult the local L1 cache on
> > "ll" instructions.
> > 
> > Therefore you must only ever access such atomic data using "ll"
> > instructions.
> 
> Note that it's actually a reasonable design choice to not consult the L1
> in these case .... as long as you invalidate it on the way. That's how
> current powerpcs do it afaik, they send a kill to any matching L1 line
> along as reading from the L2. (Of course, L1 has to be write-through for
> that to work).

Just curious: how does this work if there's an interrupt (or NMI) right after the 
invalidate instruction but before the 'll' instruction? The IRQ/NMI may refill the 
L1. Or are the two instructions coupled by hw (they form a single instruction in 
essence) and irqs/NMIs are inhibited inbetween?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15 21:56                                                     ` Will Simoneau
@ 2011-02-16 10:15                                                       ` Will Newton
  2011-02-16 12:18                                                         ` Steven Rostedt
  0 siblings, 1 reply; 113+ messages in thread
From: Will Newton @ 2011-02-16 10:15 UTC (permalink / raw)
  To: Will Simoneau
  Cc: David Miller, hpa, matt, rostedt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

On Tue, Feb 15, 2011 at 9:56 PM, Will Simoneau <simoneau@ele.uri.edu> wrote:
> On 13:27 Tue 15 Feb     , David Miller wrote:
>> From: Will Simoneau <simoneau@ele.uri.edu>
>> Date: Tue, 15 Feb 2011 16:11:23 -0500
>>
>> > Note how the cache and cache coherence protocol are fundamental parts of this
>> > operation; if these instructions simply bypassed the cache, they *could not*
>> > work correctly - how do you detect when the underlying memory has been
>> > modified?
>>
>> The issue here isn't L2 cache bypassing, it's local L1 cache bypassing.
>>
>> The chips in question aparently do not consult the local L1 cache on
>> "ll" instructions.
>>
>> Therefore you must only ever access such atomic data using "ll"
>> instructions.
>
> (I should not have said "underlying memory", since it is correct that
> only the L1 caches are the problem here)
>
> That's some really crippled hardware... it does seem like *any* loads
> from *any* address updated by an sc would have to be done with ll as
> well, else they may load stale values. One could work this into
> atomic_read(), but surely there are other places that are problems.

I think it's actually ok, atomics have arch implemented accessors, as
do spinlocks and atomic bitops. Those are the only place we do sc so
we can make sure we always ll or invalidate manually.

> It would be OK if the caches on the hardware in question were to
> back-invalidate matching cachelines when the sc is snooped from the bus,
> but it sounds like this doesn't happen?

Yes it's possible to manually invalidate the line but it is not
automatic. Manual invalidation is actually quite reasonable in many
cases because you never see a bad value, just a potentially stale one,
so many of the races are harmless in practice.

I think you're correct in your comments re multi-processor cache
coherence and the performance problems associated with not doing ll/sc
in the cache. I believe some of the reasoning behind the current
implementation is to allow different processors in the same SoC to
participate in the atomic store protocol without having a fully
coherent cache (and implementing a full cache coherence protocol).
It's my understanding that the ll/sc is implemented somewhere beyond
the cache in the bus fabric.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-16 10:15                                                       ` Will Newton
@ 2011-02-16 12:18                                                         ` Steven Rostedt
  2011-02-16 12:41                                                           ` Will Newton
  0 siblings, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2011-02-16 12:18 UTC (permalink / raw)
  To: Will Newton
  Cc: Will Simoneau, David Miller, hpa, matt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

On Wed, 2011-02-16 at 10:15 +0000, Will Newton wrote:

> > That's some really crippled hardware... it does seem like *any* loads
> > from *any* address updated by an sc would have to be done with ll as
> > well, else they may load stale values. One could work this into
> > atomic_read(), but surely there are other places that are problems.
> 
> I think it's actually ok, atomics have arch implemented accessors, as
> do spinlocks and atomic bitops. Those are the only place we do sc so
> we can make sure we always ll or invalidate manually.

I'm curious, how is cmpxchg() implemented on this architecture? As there
are several places in the kernel that uses this on regular variables
without any "accessor" functions.

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-16 12:18                                                         ` Steven Rostedt
@ 2011-02-16 12:41                                                           ` Will Newton
  2011-02-16 13:24                                                             ` Mathieu Desnoyers
                                                                               ` (3 more replies)
  0 siblings, 4 replies; 113+ messages in thread
From: Will Newton @ 2011-02-16 12:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Will Simoneau, David Miller, hpa, matt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

On Wed, Feb 16, 2011 at 12:18 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 2011-02-16 at 10:15 +0000, Will Newton wrote:
>
>> > That's some really crippled hardware... it does seem like *any* loads
>> > from *any* address updated by an sc would have to be done with ll as
>> > well, else they may load stale values. One could work this into
>> > atomic_read(), but surely there are other places that are problems.
>>
>> I think it's actually ok, atomics have arch implemented accessors, as
>> do spinlocks and atomic bitops. Those are the only place we do sc so
>> we can make sure we always ll or invalidate manually.
>
> I'm curious, how is cmpxchg() implemented on this architecture? As there
> are several places in the kernel that uses this on regular variables
> without any "accessor" functions.

We can invalidate the cache manually. The current cpu will see the new
value (post-cache invalidate) and the other cpus will see either the
old value or the new value depending on whether they read before or
after the invalidate, which is racy but I don't think it is
problematic. Unless I'm missing something...

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-16 12:41                                                           ` Will Newton
@ 2011-02-16 13:24                                                             ` Mathieu Desnoyers
  2011-02-16 22:51                                                             ` Will Simoneau
                                                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-16 13:24 UTC (permalink / raw)
  To: Will Newton
  Cc: Steven Rostedt, Will Simoneau, David Miller, hpa, matt, peterz,
	jbaron, mingo, tglx, andi, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh

* Will Newton (will.newton@gmail.com) wrote:
> On Wed, Feb 16, 2011 at 12:18 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Wed, 2011-02-16 at 10:15 +0000, Will Newton wrote:
> >
> >> > That's some really crippled hardware... it does seem like *any* loads
> >> > from *any* address updated by an sc would have to be done with ll as
> >> > well, else they may load stale values. One could work this into
> >> > atomic_read(), but surely there are other places that are problems.
> >>
> >> I think it's actually ok, atomics have arch implemented accessors, as
> >> do spinlocks and atomic bitops. Those are the only place we do sc so
> >> we can make sure we always ll or invalidate manually.
> >
> > I'm curious, how is cmpxchg() implemented on this architecture? As there
> > are several places in the kernel that uses this on regular variables
> > without any "accessor" functions.
> 
> We can invalidate the cache manually. The current cpu will see the new
> value (post-cache invalidate) and the other cpus will see either the
> old value or the new value depending on whether they read before or
> after the invalidate, which is racy but I don't think it is
> problematic. Unless I'm missing something...

Assuming the invalidate is specific to a cache-line, I'm concerned about
the failure of a scenario like the following:

initially:
foo = 0
bar = 0

CPU A                            CPU B

xchg(&foo, 1);
  ll foo
  sc foo

  -> interrupt

  if (foo == 1)
    xchg(&bar, 1);
      ll bar
      sc bar
      invalidate bar

                                 lbar = bar;
                                 smp_mb()
                                 lfoo = foo;
                                 BUG_ON(lbar == 1 && lfoo == 0);
  invalidate foo

It should be valid to expect that every time "bar" read by CPU B is 1,
then "foo" is always worth 1. However, in this case, the lack of
invalidate on foo is keeping the cacheline from reaching CPU B. There
seems to be a problem with interrupts/NMIs coming right between sc and
invalidate, as Ingo pointed out.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-16 12:41                                                           ` Will Newton
  2011-02-16 13:24                                                             ` Mathieu Desnoyers
@ 2011-02-16 22:51                                                             ` Will Simoneau
  2011-02-17  0:53                                                               ` Please watch your cc lists Andi Kleen
  2011-02-17 10:55                                                               ` [PATCH 0/2] jump label: 2.6.38 updates Will Newton
       [not found]                                                             ` <BLU0-SMTP80F56386E7E060A3B2020B96D20@phx.gbl>
       [not found]                                                             ` <BLU0-SMTP71BCB155CBAE79997EE08D96D20@phx.gbl>
  3 siblings, 2 replies; 113+ messages in thread
From: Will Simoneau @ 2011-02-16 22:51 UTC (permalink / raw)
  To: Will Newton
  Cc: Steven Rostedt, David Miller, hpa, matt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh

[-- Attachment #1: Type: text/plain, Size: 851 bytes --]

On 12:41 Wed 16 Feb     , Will Newton wrote:
> On Wed, Feb 16, 2011 at 12:18 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > I'm curious, how is cmpxchg() implemented on this architecture? As there
> > are several places in the kernel that uses this on regular variables
> > without any "accessor" functions.
> 
> We can invalidate the cache manually. The current cpu will see the new
> value (post-cache invalidate) and the other cpus will see either the
> old value or the new value depending on whether they read before or
> after the invalidate, which is racy but I don't think it is
> problematic. Unless I'm missing something...

If I understand this correctly, the manual invalidates must propagate to
all CPUs that potentially read the value, even if there is no
contention. Doesn't this involve IPIs? How does it not suck?

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Please watch your cc lists
  2011-02-16 22:51                                                             ` Will Simoneau
@ 2011-02-17  0:53                                                               ` Andi Kleen
  2011-02-17  0:56                                                                 ` David Miller
  2011-02-17 10:55                                                               ` [PATCH 0/2] jump label: 2.6.38 updates Will Newton
  1 sibling, 1 reply; 113+ messages in thread
From: Andi Kleen @ 2011-02-17  0:53 UTC (permalink / raw)
  To: Will Simoneau
  Cc: Will Newton, Steven Rostedt, David Miller, hpa, matt, peterz,
	jbaron, mathieu.desnoyers, mingo, tglx, andi, roland, rth,
	masami.hiramatsu.pt, fweisbec, avi, sam, ddaney, michael,
	linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens, benh


Folks, if you want to invent new masochistic programming models
like this please do it on an own thread with a reduced cc list.

Thank you,

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Please watch your cc lists
  2011-02-17  0:53                                                               ` Please watch your cc lists Andi Kleen
@ 2011-02-17  0:56                                                                 ` David Miller
  2011-02-17  1:04                                                                   ` Michael Witten
  0 siblings, 1 reply; 113+ messages in thread
From: David Miller @ 2011-02-17  0:56 UTC (permalink / raw)
  To: andi
  Cc: simoneau, will.newton, rostedt, hpa, matt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh

From: Andi Kleen <andi@firstfloor.org>
Date: Thu, 17 Feb 2011 01:53:00 +0100

> 
> Folks, if you want to invent new masochistic programming models
> like this please do it on an own thread with a reduced cc list.

Well, Andi, since you removed the subject nobody has any idea
what thread you are even referring to.

This makes you as much as a bozo as the people who you are chiding
right now.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-16  8:35                                                       ` Ingo Molnar
@ 2011-02-17  1:04                                                         ` H. Peter Anvin
  2011-02-17 12:51                                                           ` Ingo Molnar
  0 siblings, 1 reply; 113+ messages in thread
From: H. Peter Anvin @ 2011-02-17  1:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Benjamin Herrenschmidt, David Miller, simoneau, will.newton,
	matt, rostedt, peterz, jbaron, mathieu.desnoyers, tglx, andi,
	roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam, ddaney,
	michael, linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens

On 02/16/2011 12:35 AM, Ingo Molnar wrote:
> 
> Just curious: how does this work if there's an interrupt (or NMI) right after the 
> invalidate instruction but before the 'll' instruction? The IRQ/NMI may refill the 
> L1. Or are the two instructions coupled by hw (they form a single instruction in 
> essence) and irqs/NMIs are inhibited inbetween?
> 

http://en.wikipedia.org/wiki/Load-link/store-conditional

	-hpa

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Please watch your cc lists
  2011-02-17  0:56                                                                 ` David Miller
@ 2011-02-17  1:04                                                                   ` Michael Witten
  0 siblings, 0 replies; 113+ messages in thread
From: Michael Witten @ 2011-02-17  1:04 UTC (permalink / raw)
  To: David Miller
  Cc: andi, simoneau, will.newton, rostedt, hpa, matt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh

On Wed, Feb 16, 2011 at 18:56, David Miller <davem@davemloft.net> wrote:
> Well, Andi, since you removed the subject nobody has any idea
> what thread you are even referring to.

However, all of the necessary information is in the email headers:

    In-Reply-To: <20110216225151.GA10435@ele.uri.edu>

    References: <4D59B891.8010300@zytor.com>
      <AANLkTimp9xKbZdDZpONrxDkfMSAiQre0v=SOsJUnnoWA@mail.gmail.com>
      <20110215211123.GA3094@ele.uri.edu>
      <20110215.132702.39199169.davem@davemloft.net>
      <20110215215604.GA3177@ele.uri.edu>
      <AANLkTikXy+AJ3tdEkEN--xJPefbXJ4-OVS3cg6R7yXzc@mail.gmail.com>
      <1297858734.23343.138.camel@gandalf.stny.rr.com>
      <AANLkTinzr6rb=WwFs7QApsvdy5f7PHZ1qS9ZVrncEzZD@mail.gmail.com>
      <20110216225151.GA10435@ele.uri.edu>

It's a damn shame that our email tools ignore such useful information.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]                                                             ` <BLU0-SMTP80F56386E7E060A3B2020B96D20@phx.gbl>
@ 2011-02-17  1:55                                                               ` Masami Hiramatsu
  2011-02-17  3:19                                                                 ` H. Peter Anvin
  0 siblings, 1 reply; 113+ messages in thread
From: Masami Hiramatsu @ 2011-02-17  1:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Will Newton, Steven Rostedt, Will Simoneau, David Miller, hpa,
	matt, peterz, jbaron, mingo, tglx, andi, roland, rth, fweisbec,
	avi, sam, ddaney, michael, linux-kernel, vapier, cmetcalf,
	dhowells, schwidefsky, heiko.carstens, benh, 2nddept-manager

(2011/02/16 22:24), Mathieu Desnoyers wrote:
> * Will Newton (will.newton@gmail.com) wrote:
>> On Wed, Feb 16, 2011 at 12:18 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>>> On Wed, 2011-02-16 at 10:15 +0000, Will Newton wrote:
>>>
>>>>> That's some really crippled hardware... it does seem like *any* loads
>>>>> from *any* address updated by an sc would have to be done with ll as
>>>>> well, else they may load stale values. One could work this into
>>>>> atomic_read(), but surely there are other places that are problems.
>>>>
>>>> I think it's actually ok, atomics have arch implemented accessors, as
>>>> do spinlocks and atomic bitops. Those are the only place we do sc so
>>>> we can make sure we always ll or invalidate manually.
>>>
>>> I'm curious, how is cmpxchg() implemented on this architecture? As there
>>> are several places in the kernel that uses this on regular variables
>>> without any "accessor" functions.
>>
>> We can invalidate the cache manually. The current cpu will see the new
>> value (post-cache invalidate) and the other cpus will see either the
>> old value or the new value depending on whether they read before or
>> after the invalidate, which is racy but I don't think it is
>> problematic. Unless I'm missing something...
> 
> Assuming the invalidate is specific to a cache-line, I'm concerned about
> the failure of a scenario like the following:
> 
> initially:
> foo = 0
> bar = 0
> 
> CPU A                            CPU B
> 
> xchg(&foo, 1);
>   ll foo
>   sc foo
> 
>   -> interrupt
> 
>   if (foo == 1)
>     xchg(&bar, 1);
>       ll bar
>       sc bar
>       invalidate bar
> 
>                                  lbar = bar;
>                                  smp_mb()
>                                  lfoo = foo;
>                                  BUG_ON(lbar == 1 && lfoo == 0);
>   invalidate foo
> 
> It should be valid to expect that every time "bar" read by CPU B is 1,
> then "foo" is always worth 1. However, in this case, the lack of
> invalidate on foo is keeping the cacheline from reaching CPU B. There
> seems to be a problem with interrupts/NMIs coming right between sc and
> invalidate, as Ingo pointed out.

Hmm, I think that is miss-coding ll/sc.
If I understand correctly, usually cache invalidation should be done
right before storing value, as MSI protocol does.
(or, sc should atomically invalidate the cache line)

Thank you,

-- 
Masami HIRAMATSU
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-17  1:55                                                               ` Masami Hiramatsu
@ 2011-02-17  3:19                                                                 ` H. Peter Anvin
  2011-02-17 16:03                                                                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 113+ messages in thread
From: H. Peter Anvin @ 2011-02-17  3:19 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Mathieu Desnoyers, Will Newton, Steven Rostedt, Will Simoneau,
	David Miller, matt, peterz, jbaron, mingo, tglx, andi, roland,
	rth, fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh,
	2nddept-manager

On 02/16/2011 05:55 PM, Masami Hiramatsu wrote:
> 
> Hmm, I think that is miss-coding ll/sc.
> If I understand correctly, usually cache invalidation should be done
> right before storing value, as MSI protocol does.
> (or, sc should atomically invalidate the cache line)
> 

I suspect in this case one should flush the cache line before ll (a
cache flush will typically invalidate the ll/sc link.)

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]                                                             ` <BLU0-SMTP71BCB155CBAE79997EE08D96D20@phx.gbl>
@ 2011-02-17  3:36                                                               ` Steven Rostedt
  2011-02-17 16:13                                                                 ` Mathieu Desnoyers
       [not found]                                                                 ` <BLU0-SMTP51D40A5B1DACA8883D6AB596D50@phx.gbl>
  0 siblings, 2 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-02-17  3:36 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Will Newton, Will Simoneau, David Miller, hpa, matt, peterz,
	jbaron, mingo, tglx, roland, rth, masami.hiramatsu.pt, fweisbec,
	avi, sam, ddaney, michael, linux-kernel, vapier, cmetcalf,
	dhowells, schwidefsky, heiko.carstens, benh

[ Removed Andi as I believe this is the mysterious thread he was talking
about. Anyone else want to be removed? ]


On Wed, 2011-02-16 at 08:24 -0500, Mathieu Desnoyers wrote:
> * Will Newton (will.newton@gmail.com) wrote:

> initially:
> foo = 0
> bar = 0
> 
> CPU A                            CPU B
> 
> xchg(&foo, 1);
>   ll foo
>   sc foo
> 
>   -> interrupt
> 
>   if (foo == 1)
>     xchg(&bar, 1);
>       ll bar
>       sc bar
>       invalidate bar
> 
>                                  lbar = bar;
>                                  smp_mb()

Question: Does a mb() flush all cache or does it just make sure that
read/write operations finish before starting new ones?

>                                  lfoo = foo;

IOW, will that smp_mb() really make lfoo read the new foo in memory? If
foo happens to still be in cache and no coherency has been performed to
flush it, would it just simply read foo straight from the cache?

-- Steve

>                                  BUG_ON(lbar == 1 && lfoo == 0);
>   invalidate foo
> 
> It should be valid to expect that every time "bar" read by CPU B is 1,
> then "foo" is always worth 1. However, in this case, the lack of
> invalidate on foo is keeping the cacheline from reaching CPU B. There
> seems to be a problem with interrupts/NMIs coming right between sc and
> invalidate, as Ingo pointed out.
> 
> Thanks,
> 
> Mathieu
> 



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-16 22:51                                                             ` Will Simoneau
  2011-02-17  0:53                                                               ` Please watch your cc lists Andi Kleen
@ 2011-02-17 10:55                                                               ` Will Newton
  1 sibling, 0 replies; 113+ messages in thread
From: Will Newton @ 2011-02-17 10:55 UTC (permalink / raw)
  To: Will Simoneau
  Cc: Steven Rostedt, David Miller, hpa, matt, peterz, jbaron,
	mathieu.desnoyers, mingo, tglx, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh

On Wed, Feb 16, 2011 at 10:51 PM, Will Simoneau <simoneau@ele.uri.edu> wrote:
> On 12:41 Wed 16 Feb     , Will Newton wrote:
>> On Wed, Feb 16, 2011 at 12:18 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > I'm curious, how is cmpxchg() implemented on this architecture? As there
>> > are several places in the kernel that uses this on regular variables
>> > without any "accessor" functions.
>>
>> We can invalidate the cache manually. The current cpu will see the new
>> value (post-cache invalidate) and the other cpus will see either the
>> old value or the new value depending on whether they read before or
>> after the invalidate, which is racy but I don't think it is
>> problematic. Unless I'm missing something...
>
> If I understand this correctly, the manual invalidates must propagate to
> all CPUs that potentially read the value, even if there is no
> contention. Doesn't this involve IPIs? How does it not suck?

The cache is shared between cores (in that regard it's more like a
hyper-threaded core than a true multi-core) so is completely coherent,
so this is the one bit that doesn't really suck! Having spoken to our
hardware guys I'm confident that we'll only ever build a handful of
chip designs with the current way of doing ll/sc and hopefully future
cores will do this the "right" way.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-17  1:04                                                         ` H. Peter Anvin
@ 2011-02-17 12:51                                                           ` Ingo Molnar
  0 siblings, 0 replies; 113+ messages in thread
From: Ingo Molnar @ 2011-02-17 12:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Benjamin Herrenschmidt, David Miller, simoneau, will.newton,
	matt, rostedt, peterz, jbaron, mathieu.desnoyers, tglx, andi,
	roland, rth, masami.hiramatsu.pt, fweisbec, avi, sam, ddaney,
	michael, linux-kernel, vapier, cmetcalf, dhowells, schwidefsky,
	heiko.carstens


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 02/16/2011 12:35 AM, Ingo Molnar wrote:
> > 
> > Just curious: how does this work if there's an interrupt (or NMI) right after the 
> > invalidate instruction but before the 'll' instruction? The IRQ/NMI may refill the 
> > L1. Or are the two instructions coupled by hw (they form a single instruction in 
> > essence) and irqs/NMIs are inhibited inbetween?
> > 
> 
> http://en.wikipedia.org/wiki/Load-link/store-conditional

Oh, ll/sc, that indeed clicks - i even wrote such assembly code many years ago ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-17  3:19                                                                 ` H. Peter Anvin
@ 2011-02-17 16:03                                                                   ` Mathieu Desnoyers
  0 siblings, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-17 16:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Masami Hiramatsu, Will Newton, Steven Rostedt, Will Simoneau,
	David Miller, matt, peterz, jbaron, mingo, tglx, roland, rth,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh,
	2nddept-manager

* H. Peter Anvin (hpa@zytor.com) wrote:
> On 02/16/2011 05:55 PM, Masami Hiramatsu wrote:
> > 
> > Hmm, I think that is miss-coding ll/sc.
> > If I understand correctly, usually cache invalidation should be done
> > right before storing value, as MSI protocol does.
> > (or, sc should atomically invalidate the cache line)
> > 
> 
> I suspect in this case one should flush the cache line before ll (a
> cache flush will typically invalidate the ll/sc link.)

hrm, but if you have:

  invalidate
  -> interrupt
     read (fetch the invalidated cacheline)
  ll
  sc

you basically end up in a situation similar to not having any
invalidate, no ? AFAIU, disabling interrupts around the whole
ll-sc-invalidate (or invalidate-ll-sc) seems required for this specific
architecture, so the invalidation is made "atomic" with the ll-sc pair
from the point of view of one hardware thread.

Mathieu

> 
> 	-hpa
> 
> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-17  3:36                                                               ` Steven Rostedt
@ 2011-02-17 16:13                                                                 ` Mathieu Desnoyers
       [not found]                                                                 ` <BLU0-SMTP51D40A5B1DACA8883D6AB596D50@phx.gbl>
  1 sibling, 0 replies; 113+ messages in thread
From: Mathieu Desnoyers @ 2011-02-17 16:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Will Newton, Will Simoneau, David Miller, hpa, matt, peterz,
	jbaron, mingo, tglx, roland, rth, masami.hiramatsu.pt, fweisbec,
	avi, sam, ddaney, michael, linux-kernel, vapier, cmetcalf,
	dhowells, schwidefsky, heiko.carstens, benh

* Steven Rostedt (rostedt@goodmis.org) wrote:
> [ Removed Andi as I believe this is the mysterious thread he was talking
> about. Anyone else want to be removed? ]
> 
> 
> On Wed, 2011-02-16 at 08:24 -0500, Mathieu Desnoyers wrote:
> > * Will Newton (will.newton@gmail.com) wrote:
> 
> > initially:
> > foo = 0
> > bar = 0
> > 
> > CPU A                            CPU B
> > 
> > xchg(&foo, 1);
> >   ll foo
> >   sc foo
> > 
> >   -> interrupt
> > 
> >   if (foo == 1)
> >     xchg(&bar, 1);
> >       ll bar
> >       sc bar
> >       invalidate bar
> > 
> >                                  lbar = bar;
> >                                  smp_mb()
> 
> Question: Does a mb() flush all cache or does it just make sure that
> read/write operations finish before starting new ones?

AFAIK, the Linux kernel memory model semantic only cares about coherent
caches (I'd be interested to learn if I am wrong here). Therefore,
smp_mb() affects ordering of data memory read/writes only, not cache
invalidation -- _however_, it apply only in a memory model where the
underlying accesses are performed on coherent caches.

> 
> >                                  lfoo = foo;
> 
> IOW, will that smp_mb() really make lfoo read the new foo in memory? If
> foo happens to still be in cache and no coherency has been performed to
> flush it, would it just simply read foo straight from the cache?

If we were to deploy the Linux kernel on an architecture without
coherent caches, I think smp_mb() should imply a cacheline invalidation,
otherwise we completely mess up the order of data writes vs their
observability from each invididual core POV.

This is what I do in liburcu actually. I introduced a "smp_mc() (mc for
memory commit)" macro to specify that cache invalidation is required on
non-cache-coherent archs. smp_mb() imply a smp_mc(). (smp_mc() is
therefore weaker than smp_mb(), because the mb imply ordering of memory
operations performed by a given core, while smp_mc only ensures that
the core caches are synchronized with memory)

Thanks,

Mathieu

> 
> -- Steve
> 
> >                                  BUG_ON(lbar == 1 && lfoo == 0);
> >   invalidate foo
> > 
> > It should be valid to expect that every time "bar" read by CPU B is 1,
> > then "foo" is always worth 1. However, in this case, the lack of
> > invalidate on foo is keeping the cacheline from reaching CPU B. There
> > seems to be a problem with interrupts/NMIs coming right between sc and
> > invalidate, as Ingo pointed out.
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
       [not found]                                                                 ` <BLU0-SMTP51D40A5B1DACA8883D6AB596D50@phx.gbl>
@ 2011-02-17 20:09                                                                   ` Steven Rostedt
  0 siblings, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2011-02-17 20:09 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Will Newton, Will Simoneau, David Miller, hpa, matt, peterz,
	jbaron, mingo, tglx, roland, rth, masami.hiramatsu.pt, fweisbec,
	avi, sam, ddaney, michael, linux-kernel, vapier, cmetcalf,
	dhowells, schwidefsky, heiko.carstens, benh

On Thu, 2011-02-17 at 11:13 -0500, Mathieu Desnoyers wrote:

> > 
> > >                                  lfoo = foo;
> > 
> > IOW, will that smp_mb() really make lfoo read the new foo in memory? If
> > foo happens to still be in cache and no coherency has been performed to
> > flush it, would it just simply read foo straight from the cache?
> 
> If we were to deploy the Linux kernel on an architecture without
> coherent caches, I think smp_mb() should imply a cacheline invalidation,
> otherwise we completely mess up the order of data writes vs their
> observability from each invididual core POV.

Um but this thread is not about non-coherent caches. It's about a HW
that happens to do something stupid with ll/sc. That is, everything
deals with the cache except ll/sc which skips it.

Although, this was more or less answered in another email. That is, the
cache on this HW is not really coherent but all the CPUs just seem to
share the same cache. Thus a invalidate of the cache line affects all
CPUs which makes my question moot.

-- Steve



^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] jump label: 2.6.38 updates
  2011-02-15 11:53                                                 ` Will Newton
@ 2011-02-18 19:03                                                   ` Paul E. McKenney
  0 siblings, 0 replies; 113+ messages in thread
From: Paul E. McKenney @ 2011-02-18 19:03 UTC (permalink / raw)
  To: Will Newton
  Cc: Mathieu Desnoyers, Matt Fleming, David Miller, rostedt, peterz,
	jbaron, hpa, mingo, tglx, andi, roland, rth, masami.hiramatsu.pt,
	fweisbec, avi, sam, ddaney, michael, linux-kernel, vapier,
	cmetcalf, dhowells, schwidefsky, heiko.carstens, benh

On Tue, Feb 15, 2011 at 11:53:37AM +0000, Will Newton wrote:
> On Mon, Feb 14, 2011 at 11:09 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> 
> Hi Paul,
> 
> > What CPU family are we talking about here?  For cache coherent CPUs,
> > cache coherence really is supposed to work, even for mixed atomic and
> > non-atomic instructions to the same variable.
> 
> Is there a specific situation you can think of where this would be a
> problem? I have to admit to a certain amount of unease with the design
> our hardware guys came up with, but I don't have a specific case where
> it won't work, just cases where it is less than optimal.

OK, you did ask...

One case is when a given block of memory was subject to atomic
instructions, then was freed and reallocated as a structure used by
normal instructions.  It would be quite bad if the last pre-free atomic
operation failed to play nice with the first post-allocate non-atomic
instruction.  The reverse situation is of course important as well,
where a block subject to non-atomic instructions is freed and reallocated
as a structure subject to atomic instructions.  I would guess you would
handle these cases by making the memory allocator deal with any hardware
caching issues, but however it is handled, it does need to be handled.

Another case is a leaky-bucket token protocol, where there is a rate
limit of some sort.  There is an integer that is positive when progress
is permitted, and negative otherwise.  This integer is periodically reset
to its upper limit, and this reset operation can use a non-atomic store.
When attempting to carry out a rate-limited operation, you use either
atomic_add_return() if underflow cannot happen, but you must use
cmpxchg() if underflow is a possibility.  Now you -could- use atomic
xchg() to reset the integer, but you don't have to.  You -could- also
use atomic_cmpxchg() to check and atomic_set() to reset the limit, but
again, you don't have to.  And there might well be places in the Linux
kernel that mix atomic and non-atomic operations in this case.

Yet another case is a variation on the lockless queue that can have
concurrent enqueues but where only one task may dequeue at a time, for
example, dequeuing might be guarded by a lock.  Suppose that dequeues
removed all the elements on the queue at one shot.  Such a queue might
have a head and tail pointer, where the tail pointer references the
->next pointer of the last element, or references the head pointer if
the queue is empty.  Each element also has a flag that indicates whether
it is a normal element or a dummy element.

Enqueues are handled in the normal way for this sort of queue:

1.	Initialize the element to be added, including NULLing out
	its ->next pointer.

2.	Atomically exchange the queue's tail pointer with a pointer
	to the element's ->next pointer, placing the old tail pointer
	into a local variable (call it "oldtail").

3.	Nonatomically set the pointer referenced by oldtail to
	point to the newly added element.

Then a bulk dequeue could be written as follows:

1.	Pick up the head pointer, placing it in a local variable
	(call it "oldhead").  If NULL, return an empty list, otherwise
	continue through the following steps.

2.	Store NULL into the head pointer.  This can be done nonatomically,
	because no one else will be concurrently storing into this
	pointer -- there is at least one element on the list, and so
	the enqueuers will be instead storing to the ->next pointer
	of the last element.

3.	Atomically exchange the queue's tail pointer with a pointer
	to the queue's head pointer, placing the old value of the
	tail pointer into a local variable (again, call it "oldtail").

4.	Return a list with oldhead as the head pointer and oldtail
	as the tail pointer.

	The caller cannot rely on NULL pointers to find the end of the
	list, as an enqueuer might be delayed between steps 2 and 3.
	Instead, the caller must check to see if the address of the NULL
	pointer is equal to oldtail, in which case, the caller has in
	fact reached the end of the list.  Otherwise, the caller must
	wait for the pointer to become non-NULL.

Yes, you can replace the non-atomic loads and stores in the enqueuer's
step #3 and in the bulk dequeue's steps #1 and #2 with atomic exchange
instructions -- in fact you can replace either or both.  And you could
also require that the caller use atomic instructions when looking at
each element's ->next pointer.

There are other algorithms, but this should be a decent start.

And yes, you -can- make these algorithms use only atomic instructions, but
you don't -have- to.  So it is quite likely that similar algorithms exist
somewhere in the 10+ million lines of code making up the Linux kernel.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 113+ messages in thread

end of thread, other threads:[~2011-02-18 19:04 UTC | newest]

Thread overview: 113+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-05 15:43 [PATCH 0/2] jump label: 2.6.38 updates Jason Baron
2011-01-05 15:43 ` [PATCH 1/2] jump label: make enable/disable o(1) Jason Baron
2011-01-05 17:31   ` Steven Rostedt
2011-01-05 21:19     ` Jason Baron
2011-01-05 15:43 ` [PATCH 2/2] jump label: introduce static_branch() Jason Baron
2011-01-05 17:15   ` Frederic Weisbecker
2011-01-05 17:46     ` Steven Rostedt
2011-01-05 18:52       ` H. Peter Anvin
2011-01-05 21:19         ` Jason Baron
2011-01-05 21:14     ` Jason Baron
2011-01-05 17:32   ` David Daney
2011-01-05 17:43     ` Steven Rostedt
2011-01-05 18:44       ` David Miller
2011-01-05 20:04         ` Steven Rostedt
2011-01-05 18:56       ` H. Peter Anvin
2011-01-05 19:14         ` Ingo Molnar
2011-01-05 19:32           ` David Daney
2011-01-05 19:50             ` Ingo Molnar
2011-01-05 20:07               ` David Daney
2011-01-05 20:08                 ` H. Peter Anvin
2011-01-05 20:18                 ` Ingo Molnar
2011-01-05 21:16     ` Jason Baron
2011-01-05 17:41   ` Steven Rostedt
2011-01-09 18:48   ` Mathieu Desnoyers
2011-02-11 19:25 ` [PATCH 0/2] jump label: 2.6.38 updates Peter Zijlstra
2011-02-11 21:13   ` Mathieu Desnoyers
     [not found]   ` <BLU0-SMTP101B686C32E10BA346B15F896EF0@phx.gbl>
2011-02-11 21:38     ` Peter Zijlstra
2011-02-11 22:15       ` Jason Baron
2011-02-11 22:19         ` H. Peter Anvin
2011-02-11 22:30         ` Mathieu Desnoyers
2011-02-11 22:20       ` Mathieu Desnoyers
     [not found]       ` <BLU0-SMTP8562BA758CF8AAE5323AE296EF0@phx.gbl>
2011-02-11 22:27         ` Jason Baron
2011-02-11 22:32           ` Mathieu Desnoyers
2011-02-12 18:47       ` Peter Zijlstra
2011-02-14 12:27         ` Ingo Molnar
2011-02-14 15:51         ` Jason Baron
2011-02-14 15:57           ` Peter Zijlstra
2011-02-14 16:04             ` Jason Baron
2011-02-14 16:14               ` Mathieu Desnoyers
     [not found]               ` <BLU0-SMTP4069A1A89F06CDFF9B28F896D00@phx.gbl>
2011-02-14 16:25                 ` Peter Zijlstra
2011-02-14 16:29                   ` Jason Baron
2011-02-14 16:37                     ` Peter Zijlstra
2011-02-14 16:43                       ` Mathieu Desnoyers
2011-02-14 16:46                       ` Steven Rostedt
2011-02-14 16:53                         ` Peter Zijlstra
2011-02-14 17:18                         ` Steven Rostedt
2011-02-14 17:23                           ` Mike Frysinger
2011-02-14 17:27                           ` Peter Zijlstra
2011-02-14 17:29                             ` Mike Frysinger
2011-02-14 17:38                               ` Peter Zijlstra
2011-02-14 17:45                                 ` Mike Frysinger
2011-02-14 17:38                             ` Will Newton
2011-02-14 17:43                               ` Peter Zijlstra
2011-02-14 17:50                                 ` Will Newton
2011-02-14 18:04                                   ` Peter Zijlstra
2011-02-14 18:24                                   ` Peter Zijlstra
2011-02-14 18:53                                     ` Mathieu Desnoyers
2011-02-14 21:29                                     ` Steven Rostedt
2011-02-14 21:39                                       ` Steven Rostedt
2011-02-14 21:46                                         ` David Miller
2011-02-14 22:20                                           ` Steven Rostedt
2011-02-14 22:21                                             ` Steven Rostedt
2011-02-14 22:21                                             ` H. Peter Anvin
2011-02-14 22:29                                               ` Mathieu Desnoyers
     [not found]                                               ` <BLU0-SMTP98BFCC52FD41661DD9CC1E96D00@phx.gbl>
2011-02-14 22:33                                                 ` David Miller
2011-02-14 22:33                                             ` David Miller
2011-02-14 22:37                                           ` Matt Fleming
2011-02-14 23:03                                             ` Mathieu Desnoyers
     [not found]                                             ` <BLU0-SMTP166A8555C791786059B0FF96D00@phx.gbl>
2011-02-14 23:09                                               ` Paul E. McKenney
2011-02-14 23:29                                                 ` Mathieu Desnoyers
     [not found]                                                 ` <BLU0-SMTP4599FAAD7330498472B87396D00@phx.gbl>
2011-02-15  0:19                                                   ` Segher Boessenkool
2011-02-15  0:48                                                     ` Mathieu Desnoyers
2011-02-15  1:29                                                     ` Steven Rostedt
     [not found]                                                 ` <BLU0-SMTP984E876DBDFBC13F4C86F896D00@phx.gbl>
2011-02-15  0:42                                                   ` Paul E. McKenney
2011-02-15  0:51                                                     ` Mathieu Desnoyers
2011-02-15 11:53                                                 ` Will Newton
2011-02-18 19:03                                                   ` Paul E. McKenney
2011-02-14 23:19                                             ` H. Peter Anvin
2011-02-15 11:01                                               ` Will Newton
2011-02-15 13:31                                                 ` H. Peter Anvin
2011-02-15 13:49                                                   ` Steven Rostedt
2011-02-15 14:04                                                   ` Will Newton
2011-02-15 21:11                                                 ` Will Simoneau
2011-02-15 21:27                                                   ` David Miller
2011-02-15 21:56                                                     ` Will Simoneau
2011-02-16 10:15                                                       ` Will Newton
2011-02-16 12:18                                                         ` Steven Rostedt
2011-02-16 12:41                                                           ` Will Newton
2011-02-16 13:24                                                             ` Mathieu Desnoyers
2011-02-16 22:51                                                             ` Will Simoneau
2011-02-17  0:53                                                               ` Please watch your cc lists Andi Kleen
2011-02-17  0:56                                                                 ` David Miller
2011-02-17  1:04                                                                   ` Michael Witten
2011-02-17 10:55                                                               ` [PATCH 0/2] jump label: 2.6.38 updates Will Newton
     [not found]                                                             ` <BLU0-SMTP80F56386E7E060A3B2020B96D20@phx.gbl>
2011-02-17  1:55                                                               ` Masami Hiramatsu
2011-02-17  3:19                                                                 ` H. Peter Anvin
2011-02-17 16:03                                                                   ` Mathieu Desnoyers
     [not found]                                                             ` <BLU0-SMTP71BCB155CBAE79997EE08D96D20@phx.gbl>
2011-02-17  3:36                                                               ` Steven Rostedt
2011-02-17 16:13                                                                 ` Mathieu Desnoyers
     [not found]                                                                 ` <BLU0-SMTP51D40A5B1DACA8883D6AB596D50@phx.gbl>
2011-02-17 20:09                                                                   ` Steven Rostedt
2011-02-15 22:20                                                     ` Benjamin Herrenschmidt
2011-02-16  8:35                                                       ` Ingo Molnar
2011-02-17  1:04                                                         ` H. Peter Anvin
2011-02-17 12:51                                                           ` Ingo Molnar
     [not found]                                             ` <BLU0-SMTP637B2E9372CFBF3A0B5B0996D00@phx.gbl>
2011-02-14 23:25                                               ` David Miller
2011-02-14 23:34                                                 ` Mathieu Desnoyers
     [not found]                                                 ` <20110214233405.GC17432@Krystal>
2011-02-14 23:52                                                   ` Mathieu Desnoyers
2011-02-14 22:15                                         ` Matt Fleming
2011-02-15 15:20                             ` Heiko Carstens
     [not found]                       ` <BLU0-SMTP64371A838030ED92A7CCB696D00@phx.gbl>
2011-02-14 18:54                         ` Jason Baron
2011-02-14 19:20                           ` Peter Zijlstra
2011-02-14 19:48                             ` Mathieu Desnoyers
2011-02-14 16:11         ` Mathieu Desnoyers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.