linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/4] dynamic indirect call promotion
@ 2019-02-02  0:05 Edward Cree
  2019-02-02  0:07 ` [RFC PATCH v2 1/4] static_call: add indirect call promotion (dynamic_call) infrastructure Edward Cree
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Edward Cree @ 2019-02-02  0:05 UTC (permalink / raw)
  To: namit, jpoimboe; +Cc: linux-kernel, x86

This series introduces 'dynamic_calls', branch trees of static calls (updated
 at runtime using text patching), to avoid making indirect calls to common
 targets.  The basic mechanism is
    if (func == static_key_1.target)
        call_static_key_1(args);
    else if (func == static_key_2.target)
        call_static_key_2(args);
    /* ... */
    else
        (*func)(args); /* typically leads to a retpoline nowadays */
 with some additional statistics-gathering to allow periodic relearning of
 branch targets.  Creating and calling a dynamic call table are each a single
 line in the consuming code, although they expand to a nontrivial amount of
 data and text in the kernel image.
This is essentially indirect branch prediction, performed in software because
 we can't trust hardware to get it right.  While the processor may speculate
 into the function calls, this is *probably* OK since they are known to be
 functions that frequently get called in this path, and thus are much less
 likely to contain side-channel information leaks than a completely arbitrary
 target PC from a branch target buffer.  Moreover, when the speculation is
 accurate we positively want to be able to speculate into the callee.
The branch target statistics are collected with percpu variables, counting
 both 'hits' on the existing branch targets and 'misses', divided into counts
 for up to four specific targets (first-come-first-served) and a catch-all
 miss count used once that table is full.
When the total number of specific misses on a cpu reaches 1000, work is
 triggered which adds up counts across all CPUs and chooses the two most-
 popular call targets to patch into the call path.
If instead the catch-all miss count reaches 1000, the counts and specific
 targets for that cpu are discarded, since either the target is too
 unpredictable (lots of low-frequency callees rather than a few dominating
 ones) or the targets that populated the table were by chance unpopular ones.
To ensure that the static key target does not change between the if () check
 and the call, the whole dynamic_call must take place in an RCU read-side
 critical section (which, since the callee does not know it is being called in
 this manner, then lasts at least until the callee returns), and the patching
 at re-learning time is done with the help of a static_key to switch callers
 off the dynamic_call path and RCU synchronisation to ensure none are still on
 it.  In cases where RCU cannot be used (e.g. because some callees need to RCU
 synchronise), it might be possible to add a variant that uses
 synchronize_rcu_tasks() when updating, but this series does not attempt this.

The dynamic_calls created by this series are opt-in, partly because of the
 abovementioned rcu_read_lock requirement.

My attempts to measure the performance impact of dynamic_calls have been
 inconclusive; the effects on an RX-side UDP packet rate test were within
 ±1.5% and nowhere near statistical significance (p around 0.2-0.3 with n=6
 in a Welch t-test).  This could mean that dynamic_calls are ineffective,
 but it could also mean that many more sites need converting before any gain
 shows up, or it could just mean that my testing was insufficiently sensitive
 or measuring the wrong thing.  Given these poor results, this series is
 clearly not 'ready', hence the RFC tags, but hopefully it will inform the
 discussion in this area.

As before, this series depends on Josh's "static calls" patch series (v3 this
 time).  My testing was done with out-of-line static calls, since the inline
 implementation lead to crashes; I have not yet determined whether they were
 the fault of my patch or of the static calls series.

Edward Cree (4):
  static_call: add indirect call promotion (dynamic_call) infrastructure
  net: core: use a dynamic_call for pt_prev->func() in RX path
  net: core: use a dynamic_call for dst_input
  net: core: use a dynamic_call for pt_prev->list_func() in list RX path

 include/linux/dynamic_call.h | 300 +++++++++++++++++++++++++++++++++++++++++++
 include/net/dst.h            |   5 +-
 init/Kconfig                 |  11 ++
 kernel/Makefile              |   1 +
 kernel/dynamic_call.c        | 131 +++++++++++++++++++
 net/core/dev.c               |  18 ++-
 net/core/dst.c               |   2 +
 7 files changed, 463 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/dynamic_call.h
 create mode 100644 kernel/dynamic_call.c


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 1/4] static_call: add indirect call promotion (dynamic_call) infrastructure
  2019-02-02  0:05 [RFC PATCH v2 0/4] dynamic indirect call promotion Edward Cree
@ 2019-02-02  0:07 ` Edward Cree
  2019-02-02  0:07 ` [RFC PATCH v2 2/4] net: core: use a dynamic_call for pt_prev->func() in RX path Edward Cree
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Edward Cree @ 2019-02-02  0:07 UTC (permalink / raw)
  To: namit, jpoimboe; +Cc: linux-kernel, x86

Uses runtime instrumentation of callees from an indirect call site to
 populate an indirect-call-wrapper branch tree.  Essentially we're doing
 indirect branch prediction in software because the hardware can't be
 trusted to get it right; this is sad.
Calls to these trampolines must take place within an RCU read-side
 critical section.  This is necessary because we use RCU synchronisation
 to ensure that no CPUs are running the fast path while we patch it;
 otherwise they could be between checking a static_call's func and
 actually calling it, and end up calling the wrong function.  The use
 of RCU as the synchronisation method means that dynamic_calls cannot be
 used for functions which call synchronize_rcu(), thus the mechanism has
 to be opt-in rather than being automatically applied to all indirect
 calls in the kernel.

Enabled by new CONFIG_DYNAMIC_CALLS, which defaults to off (and depends
 on a static_call implementation being available).

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/dynamic_call.h | 300 +++++++++++++++++++++++++++++++++++++++++++
 init/Kconfig                 |  11 ++
 kernel/Makefile              |   1 +
 kernel/dynamic_call.c        | 131 +++++++++++++++++++
 4 files changed, 443 insertions(+)
 create mode 100644 include/linux/dynamic_call.h
 create mode 100644 kernel/dynamic_call.c

diff --git a/include/linux/dynamic_call.h b/include/linux/dynamic_call.h
new file mode 100644
index 000000000000..2e84543c0c8b
--- /dev/null
+++ b/include/linux/dynamic_call.h
@@ -0,0 +1,300 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_DYNAMIC_CALL_H
+#define _LINUX_DYNAMIC_CALL_H
+
+/*
+ * Dynamic call (optpoline) support
+ *
+ * Dynamic calls use code patching and runtime learning to promote indirect
+ * calls into direct calls using the static_call machinery.  They give the
+ * flexibility of function pointers, but with improved performance.  This is
+ * especially important for cases where retpolines would otherwise be used, as
+ * retpolines can significantly impact performance.
+ * The two callees learned to be most common will be made through static_calls,
+ * while for any other callee the trampoline will fall back to an indirect call
+ * (or a retpoline, if those are enabled).
+ * Patching of newly learned callees into the fast-path relies on RCU to ensure
+ * the fast-path is not in use on any CPU; thus the calls must be made under
+ * the RCU read lock.
+ *
+ *
+ * A dynamic call table must be defined in file scope with
+ *	DYNAMIC_CALL_$NR(ret, name, type1, ..., type$NR);
+ * where $NR is from 1 to 4, ret is the return type of the function and type1
+ * through type$NR are the argument types.  Then, calls can be made through a
+ * matching function pointer 'func' with
+ *	x = dynamic_name(func, arg1, ..., arg$NR);
+ * which will behave equivalently to
+ *	(*func)(arg1, ..., arg$NR);
+ * except hopefully with higher performance.  It is allowed for multiple
+ * callsites to use the same dynamic call table, in which case they will share
+ * statistics for learning.  This will perform well as long as the callsites
+ * typically have the same set of common callees.
+ *
+ * Usage example:
+ *
+ *	struct foo {
+ *		int x;
+ *		int (*f)(int);
+ *	}
+ *	DYNAMIC_CALL_1(int, handle_foo, int);
+ *
+ *	int handle_foo(struct foo *f)
+ *	{
+ *		return dynamic_handle_foo(f->f, f->x);
+ *	}
+ *
+ * This should behave the same as if the function body were changed to:
+ *		return (f->f)(f->x);
+ * but potentially with improved performance.
+ */
+
+#define DEFINE_DYNAMIC_CALL_1(_ret, _name, _type1)			       \
+_ret dynamic_##_name(_ret (*func)(_type1), _type1 arg1);
+
+#define DEFINE_DYNAMIC_CALL_2(_ret, _name, _type1, _name2, _type2)	       \
+_ret dynamic_##_name(_ret (*func)(_type1, _type2), _type1 arg1, _type2 arg2);
+
+#define DEFINE_DYNAMIC_CALL_3(_ret, _name, _type1, _type2, _type3)	       \
+_ret dynamic_##_name(_ret (*func)(_type1, _type2, _type3), _type1 arg1,	       \
+		     _type2 arg2, _type3 arg3);
+
+#define DEFINE_DYNAMIC_CALL_4(_ret, _name, _type1, _type2, _type3, _type4)     \
+_ret dynamic_##_name(_ret (*func)(_type1, _type2, _type3, _type4), _type1 arg1,\
+		     _type2 arg2, _type3 arg3, _type4 arg4);
+
+#ifdef CONFIG_DYNAMIC_CALLS
+
+#include <linux/jump_label.h>
+#include <linux/mutex.h>
+#include <linux/percpu.h>
+#include <linux/static_call.h>
+#include <linux/string.h>
+#include <linux/workqueue.h>
+
+/* Number of callees from the slowpath to track on each CPU */
+#define DYNAMIC_CALL_CANDIDATES	4
+/*
+ * Number of fast-path callees; to change this, much of the macrology below
+ * must also be changed.
+ */
+#define DYNAMIC_CALL_BRANCHES	2
+struct dynamic_call_candidate {
+	void *func;
+	unsigned long hit_count;
+};
+struct dynamic_call_percpu {
+	struct dynamic_call_candidate candidates[DYNAMIC_CALL_CANDIDATES];
+	unsigned long hit_count[DYNAMIC_CALL_BRANCHES];
+	unsigned long miss_count;
+};
+struct dynamic_call {
+	struct work_struct update_work;
+	struct static_key_false *skip_stats;
+	struct static_key_true *skip_fast;
+	struct static_call_key *key[DYNAMIC_CALL_BRANCHES];
+	struct __percpu dynamic_call_percpu *percpu;
+	struct mutex update_lock;
+};
+
+void dynamic_call_update(struct work_struct *work);
+
+
+#define __DYNAMIC_CALL_BITS(_ret, _name, ...)				       \
+static _ret dummy_##_name(__VA_ARGS__)					       \
+{									       \
+	BUG();								       \
+}									       \
+DEFINE_STATIC_KEY_TRUE(_name##_skip_fast);				       \
+DEFINE_STATIC_KEY_FALSE(_name##_skip_stats);				       \
+DEFINE_STATIC_CALL(dynamic_##_name##_1, dummy_##_name);			       \
+DEFINE_STATIC_CALL(dynamic_##_name##_2, dummy_##_name);			       \
+DEFINE_PER_CPU(struct dynamic_call_percpu, _name##_dc_pc);		       \
+									       \
+static struct dynamic_call _name##_dc = {				       \
+	.update_work = __WORK_INITIALIZER(_name##_dc.update_work,	       \
+					  dynamic_call_update),		       \
+	.skip_stats = &_name##_skip_stats,				       \
+	.skip_fast = &_name##_skip_fast,				       \
+	.key = {&dynamic_##_name##_1, &dynamic_##_name##_2},		       \
+	.percpu = &_name##_dc_pc,					       \
+	.update_lock = __MUTEX_INITIALIZER(_name##_dc.update_lock),	       \
+};
+
+#define __DYNAMIC_CALL_STATS(_name)					       \
+	if (static_branch_unlikely(&_name##_skip_stats))		       \
+		goto skip_stats;					       \
+	for (i = 0; i < DYNAMIC_CALL_CANDIDATES; i++)			       \
+		if (func == thiscpu->candidates[i].func) {		       \
+			thiscpu->candidates[i].hit_count++;		       \
+			break;						       \
+		}							       \
+	if (i == DYNAMIC_CALL_CANDIDATES) /* no match */		       \
+		for (i = 0; i < DYNAMIC_CALL_CANDIDATES; i++)		       \
+			if (!thiscpu->candidates[i].func) {		       \
+				thiscpu->candidates[i].func = func;	       \
+				thiscpu->candidates[i].hit_count = 1;	       \
+				break;					       \
+			}						       \
+	if (i == DYNAMIC_CALL_CANDIDATES) /* no space */		       \
+		thiscpu->miss_count++;					       \
+									       \
+	for (i = 0; i < DYNAMIC_CALL_CANDIDATES; i++)			       \
+		total_count += thiscpu->candidates[i].hit_count;	       \
+	if (total_count > 1000) /* Arbitrary threshold */		       \
+		schedule_work(&_name##_dc.update_work);			       \
+	else if (thiscpu->miss_count > 1000) {				       \
+		/* Many misses, few hits: let's roll the dice again for a      \
+		 * fresh set of candidates.				       \
+		 */							       \
+		memset(thiscpu->candidates, 0, sizeof(thiscpu->candidates));   \
+		thiscpu->miss_count = 0;				       \
+	}								       \
+skip_stats:
+
+
+#define DYNAMIC_CALL_1(_ret, _name, _type1)				       \
+__DYNAMIC_CALL_BITS(_ret, _name, _type1 arg1)				       \
+									       \
+_ret dynamic_##_name(_ret (*func)(_type1), _type1 arg1)			       \
+{									       \
+	struct dynamic_call_percpu *thiscpu = this_cpu_ptr(_name##_dc.percpu); \
+	unsigned long total_count = 0;					       \
+	int i;								       \
+									       \
+	WARN_ON_ONCE(!rcu_read_lock_held());					       \
+	if (static_branch_unlikely(&_name##_skip_fast))			       \
+		goto skip_fast;						       \
+	if (func == dynamic_##_name##_1.func) {				       \
+		thiscpu->hit_count[0]++;				       \
+		return static_call(dynamic_##_name##_1, arg1);		       \
+	}								       \
+	if (func == dynamic_##_name##_2.func) {				       \
+		thiscpu->hit_count[1]++;				       \
+		return static_call(dynamic_##_name##_2, arg1);		       \
+	}								       \
+									       \
+skip_fast:								       \
+	__DYNAMIC_CALL_STATS(_name)					       \
+	return func(arg1);						       \
+}
+
+#define DYNAMIC_CALL_2(_ret, _name, _type1, _type2)			       \
+__DYNAMIC_CALL_BITS(_ret, _name, _type1 arg1, _type2 arg2)		       \
+									       \
+_ret dynamic_##_name(_ret (*func)(_type1, _type2), _type1 arg1,	_type2 arg2)   \
+{									       \
+	struct dynamic_call_percpu *thiscpu = this_cpu_ptr(_name##_dc.percpu); \
+	unsigned long total_count = 0;					       \
+	int i;								       \
+									       \
+	WARN_ON_ONCE(!rcu_read_lock_held());					       \
+	if (static_branch_unlikely(&_name##_skip_fast))			       \
+		goto skip_fast;						       \
+	if (func == dynamic_##_name##_1.func) {				       \
+		thiscpu->hit_count[0]++;				       \
+		return static_call(dynamic_##_name##_1, arg1, arg2);	       \
+	}								       \
+	if (func == dynamic_##_name##_2.func) {				       \
+		thiscpu->hit_count[1]++;				       \
+		return static_call(dynamic_##_name##_2, arg1, arg2);	       \
+	}								       \
+									       \
+skip_fast:								       \
+	__DYNAMIC_CALL_STATS(_name)					       \
+	return func(arg1, arg2);					       \
+}
+
+#define DYNAMIC_CALL_3(_ret, _name, _type1, _type2, _type3)		       \
+__DYNAMIC_CALL_BITS(_ret, _name, _type1 arg1, _type2 arg2, _type3 arg3)        \
+									       \
+_ret dynamic_##_name(_ret (*func)(_type1, _type2, _type3), _type1 arg1,	       \
+		     _type2 arg2, _type3 arg3)				       \
+{									       \
+	struct dynamic_call_percpu *thiscpu = this_cpu_ptr(_name##_dc.percpu); \
+	unsigned long total_count = 0;					       \
+	int i;								       \
+									       \
+	WARN_ON_ONCE(!rcu_read_lock_held());					       \
+	if (static_branch_unlikely(&_name##_skip_fast))			       \
+		goto skip_fast;						       \
+	if (func == dynamic_##_name##_1.func) {				       \
+		thiscpu->hit_count[0]++;				       \
+		return static_call(dynamic_##_name##_1, arg1, arg2, arg3);  \
+	}								       \
+	if (func == dynamic_##_name##_2.func) {				       \
+		thiscpu->hit_count[1]++;				       \
+		return static_call(dynamic_##_name##_2, arg1, arg2, arg3);  \
+	}								       \
+									       \
+skip_fast:								       \
+	__DYNAMIC_CALL_STATS(_name)					       \
+	return func(arg1, arg2, arg3);					       \
+}
+
+#define DYNAMIC_CALL_4(_ret, _name, _type1, _type2, _type3, _type4)	       \
+__DYNAMIC_CALL_BITS(_ret, _name, _type1 arg1, _type2 arg2, _type3 arg3,	       \
+		    _type4 arg4)					       \
+									       \
+_ret dynamic_##_name(_ret (*func)(_type1, _type2, _type3, _type4), _type1 arg1,\
+		     _type2 arg2, _type3 arg3, _type4 arg4)		       \
+{									       \
+	struct dynamic_call_percpu *thiscpu = this_cpu_ptr(_name##_dc.percpu); \
+	unsigned long total_count = 0;					       \
+	int i;								       \
+									       \
+	WARN_ON_ONCE(!rcu_read_lock_held());					       \
+	if (static_branch_unlikely(&_name##_skip_fast))			       \
+		goto skip_fast;						       \
+	if (func == dynamic_##_name##_1.func) {				       \
+		thiscpu->hit_count[0]++;				       \
+		return static_call(dynamic_##_name##_1, arg1, arg2, arg3, arg4);\
+	}								       \
+	if (func == dynamic_##_name##_2.func) {				       \
+		thiscpu->hit_count[1]++;				       \
+		return static_call(dynamic_##_name##_2, arg1, arg2, arg3, arg4);\
+	}								       \
+									       \
+skip_fast:								       \
+	__DYNAMIC_CALL_STATS(_name)					       \
+	return func(arg1, arg2, arg3, arg4);				       \
+}
+
+#else /* !CONFIG_DYNAMIC_CALLS */
+
+/* Implement as simple indirect calls */
+
+#define DYNAMIC_CALL_1(_ret, _name, _type1)				       \
+_ret dynamic_##_name(_ret (*func)(_type1), _type1 arg1)			       \
+{									       \
+	WARN_ON_ONCE(!rcu_read_lock_held());					       \
+	return func(arg1);						       \
+}									       \
+
+#define DYNAMIC_CALL_2(_ret, _name, _type1, _name2, _type2)		       \
+_ret dynamic_##_name(_ret (*func)(_type1, _type2), _type1 arg1, _type2 arg2)   \
+{									       \
+	WARN_ON_ONCE(!rcu_read_lock_held());					       \
+	return func(arg1, arg2);					       \
+}									       \
+
+#define DYNAMIC_CALL_3(_ret, _name, _type1, _type2, _type3)		       \
+_ret dynamic_##_name(_ret (*func)(_type1, _type2, _type3), _type1 arg1,	       \
+		     _type2 arg2, _type3 arg3)				       \
+{									       \
+	WARN_ON_ONCE(!rcu_read_lock_held());					       \
+	return func(arg1, arg2, arg3);					       \
+}									       \
+
+#define DYNAMIC_CALL_4(_ret, _name, _type1, _type2, _type3, _type4)	       \
+_ret dynamic_##_name(_ret (*func)(_type1, _type2, _type3, _type4), _type1 arg1,\
+		     _type2 arg2, _type3 arg3, _type4 arg4)		       \
+{									       \
+	WARN_ON_ONCE(!rcu_read_lock_held());					       \
+	return func(arg1, arg2, arg3, arg4);				       \
+}									       \
+
+
+#endif /* CONFIG_DYNAMIC_CALLS */
+
+#endif /* _LINUX_DYNAMIC_CALL_H */
diff --git a/init/Kconfig b/init/Kconfig
index 513fa544a134..11133c141c21 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1779,6 +1779,17 @@ config PROFILING
 config TRACEPOINTS
 	bool
 
+config DYNAMIC_CALLS
+	bool "Dynamic call optimisation (EXPERIMENTAL)"
+	depends on HAVE_STATIC_CALL
+	help
+	  Say Y here to accelerate selected indirect calls with optpolines,
+	  using runtime learning to populate the optpoline call tables.  This
+	  should improve performance, particularly when retpolines are enabled,
+	  but increases the size of the kernel .text, and on some workloads may
+	  cause the kernel to spend a significant amount of time updating the
+	  call tables.
+
 endmenu		# General setup
 
 source "arch/Kconfig"
diff --git a/kernel/Makefile b/kernel/Makefile
index 8e1c6ca0f6e7..e6c32ac7e519 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
 obj-$(CONFIG_PADATA) += padata.o
 obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
 obj-$(CONFIG_JUMP_LABEL) += jump_label.o
+obj-$(CONFIG_DYNAMIC_CALLS) += dynamic_call.o
 obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o
 obj-$(CONFIG_TORTURE_TEST) += torture.o
 
diff --git a/kernel/dynamic_call.c b/kernel/dynamic_call.c
new file mode 100644
index 000000000000..4ba2e5cdded3
--- /dev/null
+++ b/kernel/dynamic_call.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/dynamic_call.h>
+#include <linux/printk.h>
+
+static void dynamic_call_add_cand(struct dynamic_call_candidate *top,
+				 size_t ncands,
+				 struct dynamic_call_candidate next)
+{
+	struct dynamic_call_candidate old;
+	int i;
+
+	for (i = 0; i < ncands; i++) {
+		if (next.hit_count > top[i].hit_count) {
+			/* Swap next with top[i], so that the old top[i] can
+			 * shunt along all lower scores
+			 */
+			old = top[i];
+			top[i] = next;
+			next = old;
+		}
+	}
+}
+
+static void dynamic_call_count_hits(struct dynamic_call_candidate *top,
+				   size_t ncands, struct dynamic_call *dc,
+				   int i)
+{
+	struct dynamic_call_candidate next;
+	struct dynamic_call_percpu *percpu;
+	int cpu;
+
+	next.func = dc->key[i]->func;
+	next.hit_count = 0;
+	for_each_online_cpu(cpu) {
+		percpu = per_cpu_ptr(dc->percpu, cpu);
+		next.hit_count += percpu->hit_count[i];
+		percpu->hit_count[i] = 0;
+	}
+
+	dynamic_call_add_cand(top, ncands, next);
+}
+
+void dynamic_call_update(struct work_struct *work)
+{
+	struct dynamic_call *dc = container_of(work, struct dynamic_call,
+					       update_work);
+	struct dynamic_call_candidate top[4], next, *cands, *cands2;
+	struct dynamic_call_percpu *percpu, *percpu2;
+	int cpu, i, cpu2, j;
+
+	memset(top, 0, sizeof(top));
+
+	pr_debug("dynamic_call_update called for %ps\n", dc);
+	mutex_lock(&dc->update_lock);
+	/* We don't stop the other CPUs adding to their counts while this is
+	 * going on; but it doesn't really matter because this is a heuristic
+	 * anyway so we don't care about perfect accuracy.
+	 */
+	/* First count up the hits on the existing static branches */
+	for (i = 0; i < DYNAMIC_CALL_BRANCHES; i++)
+		dynamic_call_count_hits(top, ARRAY_SIZE(top), dc, i);
+	/* Next count up the callees seen in the fallback path */
+	/* Switch off stats collection in the slowpath first */
+	static_branch_enable(dc->skip_stats);
+	synchronize_rcu();
+	for_each_online_cpu(cpu) {
+		percpu = per_cpu_ptr(dc->percpu, cpu);
+		cands = percpu->candidates;
+		for (i = 0; i < DYNAMIC_CALL_CANDIDATES; i++) {
+			next = cands[i];
+			if (next.func == NULL)
+				continue;
+			next.hit_count = 0;
+			for_each_online_cpu(cpu2) {
+				percpu2 = per_cpu_ptr(dc->percpu, cpu2);
+				cands2 = percpu2->candidates;
+				for (j = 0; j < DYNAMIC_CALL_CANDIDATES; j++) {
+					if (cands2[j].func == next.func) {
+						cands2[j].func = NULL;
+						next.hit_count += cands2[j].hit_count;
+						cands2[j].hit_count = 0;
+						break;
+					}
+				}
+			}
+			dynamic_call_add_cand(top, ARRAY_SIZE(top), next);
+		}
+	}
+	/* Record our results (for debugging) */
+	for (i = 0; i < ARRAY_SIZE(top); i++) {
+		if (i < DYNAMIC_CALL_BRANCHES)
+			pr_debug("%ps: selected [%d] %pf, score %lu\n",
+				 dc, i, top[i].func, top[i].hit_count);
+		else
+			pr_debug("%ps: runnerup [%d] %pf, score %lu\n",
+				 dc, i, top[i].func, top[i].hit_count);
+	}
+	/* It's possible that we could have picked up multiple pushes of the
+	 * workitem, so someone already collected most of the count.  In that
+	 * case, don't make a decision based on only a small number of calls.
+	 */
+	if (top[0].hit_count > 250) {
+		/* Divert callers away from the fast path */
+		static_branch_enable(dc->skip_fast);
+		/* Wait for existing fast path callers to finish */
+		synchronize_rcu();
+		/* Patch the chosen callees into the fast path */
+		for(i = 0; i < DYNAMIC_CALL_BRANCHES; i++) {
+			__static_call_update(dc->key[i], top[i].func);
+			/* Clear the hit-counts, they were for the old funcs */
+			for_each_online_cpu(cpu)
+				per_cpu_ptr(dc->percpu, cpu)->hit_count[i] = 0;
+		}
+		/* Ensure the new fast path is seen before we direct anyone
+		 * into it.  This probably isn't necessary (the binary-patching
+		 * framework probably takes care of it) but let's be paranoid.
+		 */
+		wmb();
+		/* Switch callers back onto the fast path */
+		static_branch_disable(dc->skip_fast);
+	} else {
+		pr_debug("%ps: too few hits, not patching\n", dc);
+	}
+
+	/* Finally, re-enable stats gathering in the fallback path. */
+	static_branch_disable(dc->skip_stats);
+
+	mutex_unlock(&dc->update_lock);
+	pr_debug("dynamic_call_update (%ps) finished\n", dc);
+}


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 2/4] net: core: use a dynamic_call for pt_prev->func() in RX path
  2019-02-02  0:05 [RFC PATCH v2 0/4] dynamic indirect call promotion Edward Cree
  2019-02-02  0:07 ` [RFC PATCH v2 1/4] static_call: add indirect call promotion (dynamic_call) infrastructure Edward Cree
@ 2019-02-02  0:07 ` Edward Cree
  2019-02-02  0:07 ` [RFC PATCH v2 3/4] net: core: use a dynamic_call for dst_input Edward Cree
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Edward Cree @ 2019-02-02  0:07 UTC (permalink / raw)
  To: namit, jpoimboe; +Cc: linux-kernel, x86

Typically a small number of callees, such as ip[v6]_rcv or packet_rcv,
 will cover most packets.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 net/core/dev.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8e276e0192a1..7b38a33689d8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -146,6 +146,7 @@
 #include <net/udp_tunnel.h>
 #include <linux/net_namespace.h>
 #include <linux/indirect_call_wrapper.h>
+#include <linux/dynamic_call.h>
 
 #include "net-sysfs.h"
 
@@ -1949,6 +1950,9 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
 
+DYNAMIC_CALL_4(int, deliver_skb, struct sk_buff *, struct net_device *,
+	       struct packet_type *, struct net_device *);
+
 static inline int deliver_skb(struct sk_buff *skb,
 			      struct packet_type *pt_prev,
 			      struct net_device *orig_dev)
@@ -1956,7 +1960,7 @@ static inline int deliver_skb(struct sk_buff *skb,
 	if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
 		return -ENOMEM;
 	refcount_inc(&skb->users);
-	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+	return dynamic_deliver_skb(pt_prev->func, skb, skb->dev, pt_prev, orig_dev);
 }
 
 static inline void deliver_ptype_list_skb(struct sk_buff *skb,
@@ -4970,7 +4974,8 @@ static int __netif_receive_skb_one_core(struct sk_buff *skb, bool pfmemalloc)
 
 	ret = __netif_receive_skb_core(skb, pfmemalloc, &pt_prev);
 	if (pt_prev)
-		ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+		ret = dynamic_deliver_skb(pt_prev->func, skb, skb->dev, pt_prev,
+					  orig_dev);
 	return ret;
 }
 
@@ -5015,7 +5020,8 @@ static inline void __netif_receive_skb_list_ptype(struct list_head *head,
 		pt_prev->list_func(head, pt_prev, orig_dev);
 	else
 		list_for_each_entry_safe(skb, next, head, list)
-			pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+			dynamic_deliver_skb(pt_prev->func, skb, skb->dev,
+					    pt_prev, orig_dev);
 }
 
 static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemalloc)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 3/4] net: core: use a dynamic_call for dst_input
  2019-02-02  0:05 [RFC PATCH v2 0/4] dynamic indirect call promotion Edward Cree
  2019-02-02  0:07 ` [RFC PATCH v2 1/4] static_call: add indirect call promotion (dynamic_call) infrastructure Edward Cree
  2019-02-02  0:07 ` [RFC PATCH v2 2/4] net: core: use a dynamic_call for pt_prev->func() in RX path Edward Cree
@ 2019-02-02  0:07 ` Edward Cree
  2019-02-02  0:08 ` [RFC PATCH v2 4/4] net: core: use a dynamic_call for pt_prev->list_func() in list RX path Edward Cree
  2019-02-05  8:50 ` [RFC PATCH v2 0/4] dynamic indirect call promotion Nadav Amit
  4 siblings, 0 replies; 8+ messages in thread
From: Edward Cree @ 2019-02-02  0:07 UTC (permalink / raw)
  To: namit, jpoimboe; +Cc: linux-kernel, x86

Typically there will be a small number of callees, such as ip_local_deliver
 or ip_forward, which will cover most packets.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/net/dst.h | 5 ++++-
 net/core/dst.c    | 2 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 6cf0870414c7..5dd838b9a7d2 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -16,6 +16,7 @@
 #include <linux/bug.h>
 #include <linux/jiffies.h>
 #include <linux/refcount.h>
+#include <linux/dynamic_call.h>
 #include <net/neighbour.h>
 #include <asm/processor.h>
 
@@ -444,10 +445,12 @@ static inline int dst_output(struct net *net, struct sock *sk, struct sk_buff *s
 	return skb_dst(skb)->output(net, sk, skb);
 }
 
+DEFINE_DYNAMIC_CALL_1(int, dst_input, struct sk_buff *);
+
 /* Input packet from network to transport.  */
 static inline int dst_input(struct sk_buff *skb)
 {
-	return skb_dst(skb)->input(skb);
+	return dynamic_dst_input(skb_dst(skb)->input, skb);
 }
 
 static inline struct dst_entry *dst_check(struct dst_entry *dst, u32 cookie)
diff --git a/net/core/dst.c b/net/core/dst.c
index 81ccf20e2826..a00a75bab84e 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -342,3 +342,5 @@ void metadata_dst_free_percpu(struct metadata_dst __percpu *md_dst)
 	free_percpu(md_dst);
 }
 EXPORT_SYMBOL_GPL(metadata_dst_free_percpu);
+
+DYNAMIC_CALL_1(int, dst_input, struct sk_buff *);


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 4/4] net: core: use a dynamic_call for pt_prev->list_func() in list RX path
  2019-02-02  0:05 [RFC PATCH v2 0/4] dynamic indirect call promotion Edward Cree
                   ` (2 preceding siblings ...)
  2019-02-02  0:07 ` [RFC PATCH v2 3/4] net: core: use a dynamic_call for dst_input Edward Cree
@ 2019-02-02  0:08 ` Edward Cree
  2019-02-05  8:50 ` [RFC PATCH v2 0/4] dynamic indirect call promotion Nadav Amit
  4 siblings, 0 replies; 8+ messages in thread
From: Edward Cree @ 2019-02-02  0:08 UTC (permalink / raw)
  To: namit, jpoimboe; +Cc: linux-kernel, x86

There are currently only two possible callees, ip_list_rcv and
 ipv6_list_rcv.  Even when more are added, most packets will typically
 follow one of a small number of callees on any given system.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 net/core/dev.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7b38a33689d8..ecf41618a279 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5006,6 +5006,9 @@ int netif_receive_skb_core(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(netif_receive_skb_core);
 
+DYNAMIC_CALL_3(void, deliver_skb_list, struct list_head *, struct packet_type *,
+	       struct net_device *);
+
 static inline void __netif_receive_skb_list_ptype(struct list_head *head,
 						  struct packet_type *pt_prev,
 						  struct net_device *orig_dev)
@@ -5017,7 +5020,8 @@ static inline void __netif_receive_skb_list_ptype(struct list_head *head,
 	if (list_empty(head))
 		return;
 	if (pt_prev->list_func != NULL)
-		pt_prev->list_func(head, pt_prev, orig_dev);
+		dynamic_deliver_skb_list(pt_prev->list_func, head, pt_prev,
+					 orig_dev);
 	else
 		list_for_each_entry_safe(skb, next, head, list)
 			dynamic_deliver_skb(pt_prev->func, skb, skb->dev,

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 0/4] dynamic indirect call promotion
  2019-02-02  0:05 [RFC PATCH v2 0/4] dynamic indirect call promotion Edward Cree
                   ` (3 preceding siblings ...)
  2019-02-02  0:08 ` [RFC PATCH v2 4/4] net: core: use a dynamic_call for pt_prev->list_func() in list RX path Edward Cree
@ 2019-02-05  8:50 ` Nadav Amit
  2019-02-15 17:21   ` Edward Cree
  4 siblings, 1 reply; 8+ messages in thread
From: Nadav Amit @ 2019-02-05  8:50 UTC (permalink / raw)
  To: Edward Cree; +Cc: Josh Poimboeuf, LKML, x86

> On Feb 1, 2019, at 4:05 PM, Edward Cree <ecree@solarflare.com> wrote:
> 
> This series introduces 'dynamic_calls', branch trees of static calls (updated
> at runtime using text patching), to avoid making indirect calls to common
> targets.  The basic mechanism is
>    if (func == static_key_1.target)
>        call_static_key_1(args);
>    else if (func == static_key_2.target)
>        call_static_key_2(args);
>    /* ... */
>    else
>        (*func)(args); /* typically leads to a retpoline nowadays */
> with some additional statistics-gathering to allow periodic relearning of
> branch targets.  Creating and calling a dynamic call table are each a single
> line in the consuming code, although they expand to a nontrivial amount of
> data and text in the kernel image.
> This is essentially indirect branch prediction, performed in software because
> we can't trust hardware to get it right.  While the processor may speculate
> into the function calls, this is *probably* OK since they are known to be
> functions that frequently get called in this path, and thus are much less
> likely to contain side-channel information leaks than a completely arbitrary
> target PC from a branch target buffer.

My rationale is that it is ok since I presume that even after Spectre v2 is
addressed in HW, speculation might be possible using valid BTB targets
(matching the source RIP). This is somewhat equivalent to having software
prediction.

> Moreover, when the speculation is
> accurate we positively want to be able to speculate into the callee.
> The branch target statistics are collected with percpu variables, counting
> both 'hits' on the existing branch targets and 'misses', divided into counts
> for up to four specific targets (first-come-first-served) and a catch-all
> miss count used once that table is full.
> When the total number of specific misses on a cpu reaches 1000, work is
> triggered which adds up counts across all CPUs and chooses the two most-
> popular call targets to patch into the call path.
> If instead the catch-all miss count reaches 1000, the counts and specific
> targets for that cpu are discarded, since either the target is too
> unpredictable (lots of low-frequency callees rather than a few dominating
> ones) or the targets that populated the table were by chance unpopular ones.
> To ensure that the static key target does not change between the if () check
> and the call, the whole dynamic_call must take place in an RCU read-side
> critical section (which, since the callee does not know it is being called in
> this manner, then lasts at least until the callee returns), and the patching
> at re-learning time is done with the help of a static_key to switch callers
> off the dynamic_call path and RCU synchronisation to ensure none are still on
> it.  In cases where RCU cannot be used (e.g. because some callees need to RCU
> synchronise), it might be possible to add a variant that uses
> synchronize_rcu_tasks() when updating, but this series does not attempt this.

I wonder why. This seems like an easy solution, and according to Josh, Steven
Rostedt and the documentation appears to be valid.

> 
> The dynamic_calls created by this series are opt-in, partly because of the
> abovementioned rcu_read_lock requirement.
> 
> My attempts to measure the performance impact of dynamic_calls have been
> inconclusive; the effects on an RX-side UDP packet rate test were within
> ±1.5% and nowhere near statistical significance (p around 0.2-0.3 with n=6
> in a Welch t-test).  This could mean that dynamic_calls are ineffective,
> but it could also mean that many more sites need converting before any gain
> shows up, or it could just mean that my testing was insufficiently sensitive
> or measuring the wrong thing.  Given these poor results, this series is
> clearly not 'ready', hence the RFC tags, but hopefully it will inform the
> discussion in this area.

So I wasted quite some time on this profiling/implementation, and the
results that you get are not surprising. I am afraid that Josh and you
repeat some of the things I did before, which I now consider to be
“mistakes”.

Like you, I initially used a bunch of C macros to hook into the call
locations (although yours are nicer than mine). Similarly to this patch-set,
I originally change calling locations semi-manually, through an endless
process of: recording indirect branches (using performance counters);
running a python script to create a Coccinelle script to change the callers
and function definitions; and then applying the patches and fixing whatever
got broken.

It took me a while to understand this is the wrong approach. The callers
IMHO should not be changed - programmers should not need to understand some
macro is actually a function call. The complaints that PeterZ and others
give regarding PV infrastructure are relevant here as well.

I presume that the reason you did not see a performance improvement is that
there are hundreds of call sites in the network stack that need to be
modified. Most of the function pointers are concentrated in all kind of
“ops” structures, so it is feasible to annotate them. Yet, changing the
callers will make the code somewhat ugly.

Indeed, it is possible to optimize your code. For example using some variant
of this_cpu_add() (*) would be better than this_cpu_ptr() followed by an add
operation. Or avoiding the WARN_ON_ONCE() if debug is not enabled somehow.
Yet, I don’t think it is the right approach.

As I stated before, I think that the best solution is to use a GCC plugin,
similar to the version that I have sent before. I have an updated version
that allows to use custom attributes to opt-in certain branches, and
potentially, if you insist on inlining multiple branches, provide the number
of branches (**). Such a solution will not enable the calling code to be
written in C and would require a plugin for each architecture. Nevertheless,
the code would be more readable.

Feel free to try my code and give me feedback. I did not get a feedback on my
last version. Is there a fundamental problem with my plugin? Did you try it 
and got bad results, or perhaps it did not build? Why do you prefer an approach
which requires annotation of the callers, instead of something that is much
more transparent?


(*) a variant that does not yell since you didn’t disable preemption, but
does not prevent preemption or disable IRQs.

(**) I didn’t send the latest version yet, since I’m still struggling to make
it compatible with the PV infrastructure. If you want, I'll send the code as
it is right now.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 0/4] dynamic indirect call promotion
  2019-02-05  8:50 ` [RFC PATCH v2 0/4] dynamic indirect call promotion Nadav Amit
@ 2019-02-15 17:21   ` Edward Cree
  2019-02-18 16:22     ` Nadav Amit
  0 siblings, 1 reply; 8+ messages in thread
From: Edward Cree @ 2019-02-15 17:21 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Josh Poimboeuf, LKML, x86

On 05/02/19 08:50, Nadav Amit wrote:
>> In cases where RCU cannot be used (e.g. because some callees need to RCU
>> synchronise), it might be possible to add a variant that uses
>> synchronize_rcu_tasks() when updating, but this series does not attempt this.
> I wonder why.
Mainly because I have yet to convince myself that it's the Right Thing.
Note also the following (from kernel/rcu/update.c):

/* * This is a very specialized primitive, intended only for a few uses in
* tracing and other situations requiring manipulation of function
* preambles and profiling hooks. The synchronize_rcu_tasks() function
* is not (yet) intended for heavy use from multiple CPUs.  */

> This seems like an easy solution, and according to Josh, Steven
> Rostedt and the documentation appears to be valid.
Will it hurt performance, though, if we end up (say) having rcu-tasks-
 based synchronisation for updates on every indirect call in the kernel?
(As would result from a plugin-based opt-out approach.)

> As I stated before, I think that the best solution is to use a GCC plugin,
> [...] Such a solution will not enable the calling code to be
> written in C and would require a plugin for each architecture.
I'm afraid I don't see why.  If we use the static_calls infrastructure,
 but then do a source-level transformation in the compiler plugin to turn
 indirect calls into dynamic_calls, it should be possible to create an
 opt-out system without any arch-specific code in the plugin (the arch-
 specific stuff being all in the static_calls code).
Any reason that can't be done?  (Note: I don't know much about GCC
 internals, maybe there's something obvious that stops a plugin doing
 things like that.)

> Feel free to try my code and give me feedback. I did not get a feedback on my
> last version. Is there a fundamental problem with my plugin? Did you try it 
> and got bad results, or perhaps it did not build?
I didn't test your patches yet, because I was busy trying to get mine
 working and ready to post (and also with unrelated work).  But now that
 that's done, next time I have cycles spare for indirect call stuff I
 guess testing (and reviewing) your approach will be next on my list.

> Why do you prefer an approach
> which requires annotation of the callers, instead of something that is much
> more transparent?
I'm concerned about the overhead (in both time and memory) of running
 learning on every indirect call site (including ones that aren't in a
 hot-path, and ones which have such a wide variety of callees that
 promotion really doesn't help) throughout the whole kernel.  Also, an
 annotating programmer knows the locking/rcu context and can thus tell
 whether a given dynamic_call should use synchronise_rcu_tasks(),
 synchronise_rcu(), or perhaps something else (if e.g. the call always
 happens under a mutex, then the updater work could take that mutex).

The real answer, though, is that I don't so much prefer this approach,
 as think that both should be tried "publicly" and evaluated by more
 developers than just us three.  There's a reason this series is
 marked RFC ;-)


-Ed

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 0/4] dynamic indirect call promotion
  2019-02-15 17:21   ` Edward Cree
@ 2019-02-18 16:22     ` Nadav Amit
  0 siblings, 0 replies; 8+ messages in thread
From: Nadav Amit @ 2019-02-18 16:22 UTC (permalink / raw)
  To: Edward Cree; +Cc: Josh Poimboeuf, LKML, x86

> On Feb 15, 2019, at 9:21 AM, Edward Cree <ecree@solarflare.com> wrote:
> 
> On 05/02/19 08:50, Nadav Amit wrote:
>>> In cases where RCU cannot be used (e.g. because some callees need to RCU
>>> synchronise), it might be possible to add a variant that uses
>>> synchronize_rcu_tasks() when updating, but this series does not attempt this.
>> I wonder why.
> Mainly because I have yet to convince myself that it's the Right Thing.
> Note also the following (from kernel/rcu/update.c):
> 
> /* * This is a very specialized primitive, intended only for a few uses in
> * tracing and other situations requiring manipulation of function
> * preambles and profiling hooks. The synchronize_rcu_tasks() function
> * is not (yet) intended for heavy use from multiple CPUs.  */
> 
>> This seems like an easy solution, and according to Josh, Steven
>> Rostedt and the documentation appears to be valid.
> Will it hurt performance, though, if we end up (say) having rcu-tasks-
>  based synchronisation for updates on every indirect call in the kernel?
> (As would result from a plugin-based opt-out approach.)

That’s what batching is for..

> 
>> As I stated before, I think that the best solution is to use a GCC plugin,
>> [...] Such a solution will not enable the calling code to be
>> written in C and would require a plugin for each architecture.
> I'm afraid I don't see why.  If we use the static_calls infrastructure,
>  but then do a source-level transformation in the compiler plugin to turn
>  indirect calls into dynamic_calls, it should be possible to create an
>  opt-out system without any arch-specific code in the plugin (the arch-
>  specific stuff being all in the static_calls code).
> Any reason that can't be done?  (Note: I don't know much about GCC
>  internals, maybe there's something obvious that stops a plugin doing
>  things like that.)

Hmm… I think you are right. It may be possible by hooking into
PLUGIN_START_PARSE_FUNCTION or PLUGIN_FINISH_PARSE_FUNCTION event. But, I
think source code manipulation is likely to be more error prone and “dirty”.
I think that assembly is the right level to deal with indirect calls anyhow,
specifically if the same mechanism is used for callee-saved functions.

>> Feel free to try my code and give me feedback. I did not get a feedback on my
>> last version. Is there a fundamental problem with my plugin? Did you try it 
>> and got bad results, or perhaps it did not build?
> I didn't test your patches yet, because I was busy trying to get mine
>  working and ready to post (and also with unrelated work).  But now that
>  that's done, next time I have cycles spare for indirect call stuff I
>  guess testing (and reviewing) your approach will be next on my list.
> 
>> Why do you prefer an approach
>> which requires annotation of the callers, instead of something that is much
>> more transparent?
> I'm concerned about the overhead (in both time and memory) of running
>  learning on every indirect call site (including ones that aren't in a
>  hot-path, and ones which have such a wide variety of callees that
>  promotion really doesn't help) throughout the whole kernel.  Also, an
>  annotating programmer knows the locking/rcu context and can thus tell
>  whether a given dynamic_call should use synchronise_rcu_tasks(),
>  synchronise_rcu(), or perhaps something else (if e.g. the call always
>  happens under a mutex, then the updater work could take that mutex).
> 
> The real answer, though, is that I don't so much prefer this approach,
>  as think that both should be tried "publicly" and evaluated by more
>  developers than just us three.  There's a reason this series is
>  marked RFC ;-)

Reading my email from ~2 weeks ago - I realize I don’t really understand my
own question. Clearly, annotation is better (if possible). My point was
mainly that it is a tedious job to annotate all the locations, and there are
quite a few.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-02-18 16:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-02  0:05 [RFC PATCH v2 0/4] dynamic indirect call promotion Edward Cree
2019-02-02  0:07 ` [RFC PATCH v2 1/4] static_call: add indirect call promotion (dynamic_call) infrastructure Edward Cree
2019-02-02  0:07 ` [RFC PATCH v2 2/4] net: core: use a dynamic_call for pt_prev->func() in RX path Edward Cree
2019-02-02  0:07 ` [RFC PATCH v2 3/4] net: core: use a dynamic_call for dst_input Edward Cree
2019-02-02  0:08 ` [RFC PATCH v2 4/4] net: core: use a dynamic_call for pt_prev->list_func() in list RX path Edward Cree
2019-02-05  8:50 ` [RFC PATCH v2 0/4] dynamic indirect call promotion Nadav Amit
2019-02-15 17:21   ` Edward Cree
2019-02-18 16:22     ` Nadav Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).