* [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist
@ 2020-02-20 20:45 Thomas Gleixner
2020-02-20 20:45 ` [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs Thomas Gleixner
` (19 more replies)
0 siblings, 20 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
Hi!
This is the second version of the BPF/RT patch set which makes both coexist
nicely. The long explanation can be found in the cover letter of the V1
submission:
https://lore.kernel.org/r/20200214133917.304937432@linutronix.de
The following changes vs. V1 have been made:
- New patch to enforce preallocation for all instrumentation type
programs
- New patches which make the recursion protection safe against preemption
on RT (Mathieu)
- New patch which removes the unnecessary recursion protection around
the rcu_free() invocation
- Converted macro to inline (Mathieu)
- Added explanation about the seccomp loop to the changelog (Vinicius)
- Fixed the explicitely typos (Jakub)
- Dropped the migrate* stubs patches and merged them into the tip
tree. See below.
The series applies on top of:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-for-bpf-2020-02-20
This tag contains only the two migrate stub commits on top of 5.6-rc2 and
can be pulled into BPF. The commits are immutable and will be carried also
in tip so further changes in this area can be applied.
Thanks,
tglx
---
include/linux/bpf.h | 38 ++++++++-
include/linux/filter.h | 33 ++++++--
kernel/bpf/hashtab.c | 172 ++++++++++++++++++++++++++++++-------------
kernel/bpf/lpm_trie.c | 12 +--
kernel/bpf/percpu_freelist.c | 20 ++---
kernel/bpf/stackmap.c | 18 +++-
kernel/bpf/syscall.c | 27 ++----
kernel/bpf/trampoline.c | 9 +-
kernel/bpf/verifier.c | 18 ++--
kernel/events/core.c | 2
kernel/seccomp.c | 4 -
kernel/trace/bpf_trace.c | 6 -
lib/test_bpf.c | 4 -
net/bpf/test_run.c | 8 +-
net/core/flow_dissector.c | 4 -
net/core/skmsg.c | 8 --
net/kcm/kcmsock.c | 4 -
17 files changed, 252 insertions(+), 135 deletions(-)
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-22 4:29 ` Alexei Starovoitov
2020-02-22 16:44 ` kbuild test robot
2020-02-20 20:45 ` [patch V2 02/20] bpf: Update locking comment in hashtab code Thomas Gleixner
` (18 subsequent siblings)
19 siblings, 2 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
The assumption that only programs attached to perf NMI events can deadlock
on memory allocators is wrong. Assume the following simplified callchain:
kmalloc() from regular non BPF context
cache empty
freelist empty
lock(zone->lock);
tracepoint or kprobe
BPF()
update_elem()
lock(bucket)
kmalloc()
cache empty
freelist empty
lock(zone->lock); <- DEADLOCK
There are other ways which do not involve locking to create wreckage:
kmalloc() from regular non BPF context
local_irq_save();
...
obj = percpu_slab_first();
kprobe()
BPF()
update_elem()
lock(bucket)
kmalloc()
local_irq_save();
...
obj = percpu_slab_first(); <- Same object as above ...
So preallocation _must_ be enforced for all variants of intrusive
instrumentation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
kernel/bpf/verifier.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8144,19 +8144,23 @@ static int check_map_prog_compatibility(
struct bpf_prog *prog)
{
- /* Make sure that BPF_PROG_TYPE_PERF_EVENT programs only use
- * preallocated hash maps, since doing memory allocation
- * in overflow_handler can crash depending on where nmi got
- * triggered.
+ /*
+ * Make sure that trace type programs only use preallocated hash
+ * maps. Perf programs obviously can't do memory allocation in NMI
+ * context and all other types can deadlock on a memory allocator
+ * lock when a tracepoint/kprobe triggers a BPF program inside a
+ * lock held region or create inconsistent state when the probe is
+ * within an interrupts disabled critical region in the memory
+ * allocator.
*/
- if (prog->type == BPF_PROG_TYPE_PERF_EVENT) {
+ if ((is_tracing_prog_type(prog->type)) {
if (!check_map_prealloc(map)) {
- verbose(env, "perf_event programs can only use preallocated hash map\n");
+ verbose(env, "tracing programs can only use preallocated hash map\n");
return -EINVAL;
}
if (map->inner_map_meta &&
!check_map_prealloc(map->inner_map_meta)) {
- verbose(env, "perf_event programs can only use preallocated inner hash map\n");
+ verbose(env, "tracing programs can only use preallocated inner hash map\n");
return -EINVAL;
}
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 02/20] bpf: Update locking comment in hashtab code
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
2020-02-20 20:45 ` [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 03/20] bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run() Thomas Gleixner
` (17 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
The comment where the bucket lock is acquired says:
/* bpf_map_update_elem() can be called in_irq() */
which is not really helpful and aside of that it does not explain the
subtle details of the hash bucket locks expecially in the context of BPF
and perf, kprobes and tracing.
Add a comment at the top of the file which explains the protection scopes
and the details how potential deadlocks are prevented.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/bpf/hashtab.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -27,6 +27,26 @@
.map_delete_batch = \
generic_map_delete_batch
+/*
+ * The bucket lock has two protection scopes:
+ *
+ * 1) Serializing concurrent operations from BPF programs on differrent
+ * CPUs
+ *
+ * 2) Serializing concurrent operations from BPF programs and sys_bpf()
+ *
+ * BPF programs can execute in any context including perf, kprobes and
+ * tracing. As there are almost no limits where perf, kprobes and tracing
+ * can be invoked from the lock operations need to be protected against
+ * deadlocks. Deadlocks can be caused by recursion and by an invocation in
+ * the lock held section when functions which acquire this lock are invoked
+ * from sys_bpf(). BPF recursion is prevented by incrementing the per CPU
+ * variable bpf_prog_active, which prevents BPF programs attached to perf
+ * events, kprobes and tracing to be invoked before the prior invocation
+ * from one of these contexts completed. sys_bpf() uses the same mechanism
+ * by pinning the task to the current CPU and incrementing the recursion
+ * protection accross the map operation.
+ */
struct bucket {
struct hlist_nulls_head head;
raw_spinlock_t lock;
@@ -872,7 +892,6 @@ static int htab_map_update_elem(struct b
*/
}
- /* bpf_map_update_elem() can be called in_irq() */
raw_spin_lock_irqsave(&b->lock, flags);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -952,7 +971,6 @@ static int htab_lru_map_update_elem(stru
return -ENOMEM;
memcpy(l_new->key + round_up(map->key_size, 8), value, map->value_size);
- /* bpf_map_update_elem() can be called in_irq() */
raw_spin_lock_irqsave(&b->lock, flags);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -1007,7 +1025,6 @@ static int __htab_percpu_map_update_elem
b = __select_bucket(htab, hash);
head = &b->head;
- /* bpf_map_update_elem() can be called in_irq() */
raw_spin_lock_irqsave(&b->lock, flags);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -1071,7 +1088,6 @@ static int __htab_lru_percpu_map_update_
return -ENOMEM;
}
- /* bpf_map_update_elem() can be called in_irq() */
raw_spin_lock_irqsave(&b->lock, flags);
l_old = lookup_elem_raw(head, hash, key, key_size);
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 03/20] bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run()
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
2020-02-20 20:45 ` [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs Thomas Gleixner
2020-02-20 20:45 ` [patch V2 02/20] bpf: Update locking comment in hashtab code Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 04/20] perf/bpf: Remove preempt disable around BPF invocation Thomas Gleixner
` (16 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
__bpf_trace_run() disables preemption around the BPF_PROG_RUN() invocation.
This is redundant because __bpf_trace_run() is invoked from a trace point
via __DO_TRACE() which already disables preemption _before_ invoking any of
the functions which are attached to a trace point.
Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/trace/bpf_trace.c | 2 --
1 file changed, 2 deletions(-)
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1476,9 +1476,7 @@ static __always_inline
void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
{
rcu_read_lock();
- preempt_disable();
(void) BPF_PROG_RUN(prog, args);
- preempt_enable();
rcu_read_unlock();
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 04/20] perf/bpf: Remove preempt disable around BPF invocation
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (2 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 03/20] bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run() Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 05/20] bpf: Remove recursion prevention from rcu free callback Thomas Gleixner
` (15 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
The BPF invocation from the perf event overflow handler does not require to
disable preemption because this is called from NMI or at least hard
interrupt context which is already non-preemptible.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/events/core.c | 2 --
1 file changed, 2 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9206,7 +9206,6 @@ static void bpf_overflow_handler(struct
int ret = 0;
ctx.regs = perf_arch_bpf_user_pt_regs(regs);
- preempt_disable();
if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
goto out;
rcu_read_lock();
@@ -9214,7 +9213,6 @@ static void bpf_overflow_handler(struct
rcu_read_unlock();
out:
__this_cpu_dec(bpf_prog_active);
- preempt_enable();
if (!ret)
return;
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 05/20] bpf: Remove recursion prevention from rcu free callback
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (3 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 04/20] perf/bpf: Remove preempt disable around BPF invocation Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 06/20] bpf: Dont iterate over possible CPUs with interrupts disabled Thomas Gleixner
` (14 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
If an element is freed via RCU then recursion into BPF instrumentation
functions is not a concern. The element is already detached from the map
and the RCU callback does not hold any locks on which a kprobe, perf event
or tracepoint attached BPF program could deadlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
kernel/bpf/hashtab.c | 8 --------
1 file changed, 8 deletions(-)
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -694,15 +694,7 @@ static void htab_elem_free_rcu(struct rc
struct htab_elem *l = container_of(head, struct htab_elem, rcu);
struct bpf_htab *htab = l->htab;
- /* must increment bpf_prog_active to avoid kprobe+bpf triggering while
- * we're calling kfree, otherwise deadlock is possible if kprobes
- * are placed somewhere inside of slub
- */
- preempt_disable();
- __this_cpu_inc(bpf_prog_active);
htab_elem_free(htab, l);
- __this_cpu_dec(bpf_prog_active);
- preempt_enable();
}
static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 06/20] bpf: Dont iterate over possible CPUs with interrupts disabled
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (4 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 05/20] bpf: Remove recursion prevention from rcu free callback Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 07/20] bpf: Provide bpf_prog_run_pin_on_cpu() helper Thomas Gleixner
` (13 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
pcpu_freelist_populate() is disabling interrupts and then iterates over the
possible CPUs. The reason why this disables interrupts is to silence
lockdep because the invoked ___pcpu_freelist_push() takes spin locks.
Neither the interrupt disabling nor the locking are required in this
function because it's called during initialization and the resulting map is
not yet visible to anything.
Split out the actual push assignement into an inline, call it from the loop
and remove the interrupt disable.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/bpf/percpu_freelist.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -25,12 +25,18 @@ void pcpu_freelist_destroy(struct pcpu_f
free_percpu(s->freelist);
}
+static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head,
+ struct pcpu_freelist_node *node)
+{
+ node->next = head->first;
+ head->first = node;
+}
+
static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head,
struct pcpu_freelist_node *node)
{
raw_spin_lock(&head->lock);
- node->next = head->first;
- head->first = node;
+ pcpu_freelist_push_node(head, node);
raw_spin_unlock(&head->lock);
}
@@ -56,21 +62,16 @@ void pcpu_freelist_populate(struct pcpu_
u32 nr_elems)
{
struct pcpu_freelist_head *head;
- unsigned long flags;
int i, cpu, pcpu_entries;
pcpu_entries = nr_elems / num_possible_cpus() + 1;
i = 0;
- /* disable irq to workaround lockdep false positive
- * in bpf usage pcpu_freelist_populate() will never race
- * with pcpu_freelist_push()
- */
- local_irq_save(flags);
for_each_possible_cpu(cpu) {
again:
head = per_cpu_ptr(s->freelist, cpu);
- ___pcpu_freelist_push(head, buf);
+ /* No locking required as this is not visible yet. */
+ pcpu_freelist_push_node(head, buf);
i++;
buf += elem_size;
if (i == nr_elems)
@@ -78,7 +79,6 @@ void pcpu_freelist_populate(struct pcpu_
if (i % pcpu_entries)
goto again;
}
- local_irq_restore(flags);
}
struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s)
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 07/20] bpf: Provide bpf_prog_run_pin_on_cpu() helper
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (5 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 06/20] bpf: Dont iterate over possible CPUs with interrupts disabled Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 08/20] bpf: Replace cant_sleep() with cant_migrate() Thomas Gleixner
` (12 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
BPF programs require to run on one CPU to completion as they use per CPU
storage, but according to Alexei they don't need reentrancy protection as
obviously BPF programs running in thread context can always be 'preempted'
by hard and soft interrupts and instrumentation and the same program can
run concurrently on a different CPU.
The currently used mechanism to ensure CPUness is to wrap the invocation
into a preempt_disable/enable() pair. Disabling preemption is also
disabling migration for a task.
preempt_disable/enable() is used because there is no explicit way to
reliably disable only migration.
Provide a separate macro to invoke a BPF program which can be used in
migrateable task context.
It wraps BPF_PROG_RUN() in a migrate_disable/enable() pair which maps on
non RT enabled kernels to preempt_disable/enable(). On RT enabled kernels
this merely disables migration. Both methods ensure that the invoked BPF
program runs on one CPU to completion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Use an inline function (Mathieu)
---
include/linux/filter.h | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -579,6 +579,28 @@ DECLARE_STATIC_KEY_FALSE(bpf_stats_enabl
#define BPF_PROG_RUN(prog, ctx) __BPF_PROG_RUN(prog, ctx, \
bpf_dispatcher_nopfunc)
+/*
+ * Use in preemptible and therefore migratable context to make sure that
+ * the execution of the BPF program runs on one CPU.
+ *
+ * This uses migrate_disable/enable() explicitly to document that the
+ * invocation of a BPF program does not require reentrancy protection
+ * against a BPF program which is invoked from a preempting task.
+ *
+ * For non RT enabled kernels migrate_disable/enable() maps to
+ * preempt_disable/enable(), i.e. it disables also preemption.
+ */
+static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog,
+ void *ctx)
+{
+ u32 ret;
+
+ migrate_disable();
+ ret = __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nopfunc);
+ migrate_enable();
+ return ret;
+}
+
#define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
struct bpf_skb_data_end {
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 08/20] bpf: Replace cant_sleep() with cant_migrate()
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (6 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 07/20] bpf: Provide bpf_prog_run_pin_on_cpu() helper Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 09/20] bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites Thomas Gleixner
` (11 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
As already discussed in the previous change which introduced
BPF_RUN_PROG_PIN_ON_CPU() BPF only requires to disable migration to
guarantee per CPUness.
If RT substitutes the preempt disable based migration protection then the
cant_sleep() check will obviously trigger as preemption is not disabled.
Replace it by cant_migrate() which maps to cant_sleep() on a non RT kernel
and will verify that migration is disabled on a full RT kernel.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
include/linux/filter.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -561,7 +561,7 @@ DECLARE_STATIC_KEY_FALSE(bpf_stats_enabl
#define __BPF_PROG_RUN(prog, ctx, dfunc) ({ \
u32 ret; \
- cant_sleep(); \
+ cant_migrate(); \
if (static_branch_unlikely(&bpf_stats_enabled_key)) { \
struct bpf_prog_stats *stats; \
u64 start = sched_clock(); \
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 09/20] bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites.
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (7 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 08/20] bpf: Replace cant_sleep() with cant_migrate() Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 10/20] trace/bpf: Use migrate disable in trace_call_bpf() Thomas Gleixner
` (10 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
From: David Miller <davem@davemloft.net>
All of these cases are strictly of the form:
preempt_disable();
BPF_PROG_RUN(...);
preempt_enable();
Replace this with bpf_prog_run_pin_on_cpu() which wraps BPF_PROG_RUN()
with:
migrate_disable();
BPF_PROG_RUN(...);
migrate_enable();
On non RT enabled kernels this maps to preempt_disable/enable() and on RT
enabled kernels this solely prevents migration, which is sufficient as
there is no requirement to prevent reentrancy to any BPF program from a
preempting task. The only requirement is that the program stays on the same
CPU.
The seccomp loop does not need protection over the loop. It only needs
protection per BPF filter program
Therefore, this is a trivially correct transformation.
[ tglx: Converted to bpf_prog_run_pin_on_cpu() ]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: No change. Amended changelog vs. seccomp
---
include/linux/filter.h | 4 +---
kernel/seccomp.c | 4 +---
net/core/flow_dissector.c | 4 +---
net/core/skmsg.c | 8 ++------
net/kcm/kcmsock.c | 4 +---
5 files changed, 6 insertions(+), 18 deletions(-)
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -717,9 +717,7 @@ static inline u32 bpf_prog_run_clear_cb(
if (unlikely(prog->cb_access))
memset(cb_data, 0, BPF_SKB_CB_LEN);
- preempt_disable();
- res = BPF_PROG_RUN(prog, skb);
- preempt_enable();
+ res = bpf_prog_run_pin_on_cpu(prog, skb);
return res;
}
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -268,16 +268,14 @@ static u32 seccomp_run_filters(const str
* All filters in the list are evaluated and the lowest BPF return
* value always takes priority (ignoring the DATA).
*/
- preempt_disable();
for (; f; f = f->prev) {
- u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
+ u32 cur_ret = bpf_prog_run_pin_on_cpu(f->prog, sd);
if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
ret = cur_ret;
*match = f;
}
}
- preempt_enable();
return ret;
}
#endif /* CONFIG_SECCOMP_FILTER */
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -920,9 +920,7 @@ bool bpf_flow_dissect(struct bpf_prog *p
(int)FLOW_DISSECTOR_F_STOP_AT_ENCAP);
flow_keys->flags = flags;
- preempt_disable();
- result = BPF_PROG_RUN(prog, ctx);
- preempt_enable();
+ result = bpf_prog_run_pin_on_cpu(prog, ctx);
flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, nhoff, hlen);
flow_keys->thoff = clamp_t(u16, flow_keys->thoff,
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -628,7 +628,6 @@ int sk_psock_msg_verdict(struct sock *sk
struct bpf_prog *prog;
int ret;
- preempt_disable();
rcu_read_lock();
prog = READ_ONCE(psock->progs.msg_parser);
if (unlikely(!prog)) {
@@ -638,7 +637,7 @@ int sk_psock_msg_verdict(struct sock *sk
sk_msg_compute_data_pointers(msg);
msg->sk = sk;
- ret = BPF_PROG_RUN(prog, msg);
+ ret = bpf_prog_run_pin_on_cpu(prog, msg);
ret = sk_psock_map_verd(ret, msg->sk_redir);
psock->apply_bytes = msg->apply_bytes;
if (ret == __SK_REDIRECT) {
@@ -653,7 +652,6 @@ int sk_psock_msg_verdict(struct sock *sk
}
out:
rcu_read_unlock();
- preempt_enable();
return ret;
}
EXPORT_SYMBOL_GPL(sk_psock_msg_verdict);
@@ -665,9 +663,7 @@ static int sk_psock_bpf_run(struct sk_ps
skb->sk = psock->sk;
bpf_compute_data_end_sk_skb(skb);
- preempt_disable();
- ret = BPF_PROG_RUN(prog, skb);
- preempt_enable();
+ ret = bpf_prog_run_pin_on_cpu(prog, skb);
/* strparser clones the skb before handing it to a upper layer,
* meaning skb_orphan has been called. We NULL sk on the way out
* to ensure we don't trigger a BUG_ON() in skb/sk operations
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -380,9 +380,7 @@ static int kcm_parse_func_strparser(stru
struct bpf_prog *prog = psock->bpf_prog;
int res;
- preempt_disable();
- res = BPF_PROG_RUN(prog, skb);
- preempt_enable();
+ res = bpf_prog_run_pin_on_cpu(prog, skb);
return res;
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 10/20] trace/bpf: Use migrate disable in trace_call_bpf()
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (8 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 09/20] bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 11/20] bpf/tests: Use migrate disable instead of preempt disable Thomas Gleixner
` (9 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
BPF does not require preemption disable. It only requires to stay on the
same CPU while running a program. Reflect this by replacing
preempt_disable/enable() with migrate_disable/enable() pairs.
On a non-RT kernel this maps to preempt_disable/enable().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/trace/bpf_trace.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -83,7 +83,7 @@ unsigned int trace_call_bpf(struct trace
if (in_nmi()) /* not supported yet */
return 1;
- preempt_disable();
+ migrate_disable();
if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
/*
@@ -115,7 +115,7 @@ unsigned int trace_call_bpf(struct trace
out:
__this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ migrate_enable();
return ret;
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 11/20] bpf/tests: Use migrate disable instead of preempt disable
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (9 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 10/20] trace/bpf: Use migrate disable in trace_call_bpf() Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 12/20] bpf: Use migrate_disable/enabe() in trampoline code Thomas Gleixner
` (8 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
From: David Miller <davem@davemloft.net>
Replace the preemption disable/enable with migrate_disable/enable() to
reflect the actual requirement and to allow PREEMPT_RT to substitute it
with an actual migration disable mechanism which does not disable
preemption.
[ tglx: Switched it over to migrate disable ]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
lib/test_bpf.c | 4 ++--
net/bpf/test_run.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -6660,14 +6660,14 @@ static int __run_one(const struct bpf_pr
u64 start, finish;
int ret = 0, i;
- preempt_disable();
+ migrate_disable();
start = ktime_get_ns();
for (i = 0; i < runs; i++)
ret = BPF_PROG_RUN(fp, data);
finish = ktime_get_ns();
- preempt_enable();
+ migrate_enable();
*duration = finish - start;
do_div(*duration, runs);
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -37,7 +37,7 @@ static int bpf_test_run(struct bpf_prog
repeat = 1;
rcu_read_lock();
- preempt_disable();
+ migrate_disable();
time_start = ktime_get_ns();
for (i = 0; i < repeat; i++) {
bpf_cgroup_storage_set(storage);
@@ -54,18 +54,18 @@ static int bpf_test_run(struct bpf_prog
if (need_resched()) {
time_spent += ktime_get_ns() - time_start;
- preempt_enable();
+ migrate_enable();
rcu_read_unlock();
cond_resched();
rcu_read_lock();
- preempt_disable();
+ migrate_disable();
time_start = ktime_get_ns();
}
}
time_spent += ktime_get_ns() - time_start;
- preempt_enable();
+ migrate_enable();
rcu_read_unlock();
do_div(time_spent, repeat);
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 12/20] bpf: Use migrate_disable/enabe() in trampoline code.
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (10 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 11/20] bpf/tests: Use migrate disable instead of preempt disable Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 13/20] bpf: Use migrate_disable/enable in array macros and cgroup/lirc code Thomas Gleixner
` (7 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
From: David Miller <davem@davemloft.net>
Instead of preemption disable/enable to reflect the purpose. This allows
PREEMPT_RT to substitute it with an actual migration disable
implementation. On non RT kernels this is still mapped to
preempt_disable/enable().
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/bpf/trampoline.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -367,8 +367,9 @@ void bpf_trampoline_put(struct bpf_tramp
mutex_unlock(&trampoline_mutex);
}
-/* The logic is similar to BPF_PROG_RUN, but with explicit rcu and preempt that
- * are needed for trampoline. The macro is split into
+/* The logic is similar to BPF_PROG_RUN, but with an explicit
+ * rcu_read_lock() and migrate_disable() which are required
+ * for the trampoline. The macro is split into
* call _bpf_prog_enter
* call prog->bpf_func
* call __bpf_prog_exit
@@ -378,7 +379,7 @@ u64 notrace __bpf_prog_enter(void)
u64 start = 0;
rcu_read_lock();
- preempt_disable();
+ migrate_disable();
if (static_branch_unlikely(&bpf_stats_enabled_key))
start = sched_clock();
return start;
@@ -401,7 +402,7 @@ void notrace __bpf_prog_exit(struct bpf_
stats->nsecs += sched_clock() - start;
u64_stats_update_end(&stats->syncp);
}
- preempt_enable();
+ migrate_enable();
rcu_read_unlock();
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 13/20] bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (11 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 12/20] bpf: Use migrate_disable/enabe() in trampoline code Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 14/20] bpf: Use migrate_disable() in hashtab code Thomas Gleixner
` (6 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
From: David Miller <davem@davemloft.net>
Replace the preemption disable/enable with migrate_disable/enable() to
reflect the actual requirement and to allow PREEMPT_RT to substitute it
with an actual migration disable mechanism which does not disable
preemption.
Including the code paths that go via __bpf_prog_run_save_cb().
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
include/linux/bpf.h | 8 ++++----
include/linux/filter.h | 5 +++--
2 files changed, 7 insertions(+), 6 deletions(-)
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -885,7 +885,7 @@ int bpf_prog_array_copy(struct bpf_prog_
struct bpf_prog *_prog; \
struct bpf_prog_array *_array; \
u32 _ret = 1; \
- preempt_disable(); \
+ migrate_disable(); \
rcu_read_lock(); \
_array = rcu_dereference(array); \
if (unlikely(check_non_null && !_array))\
@@ -898,7 +898,7 @@ int bpf_prog_array_copy(struct bpf_prog_
} \
_out: \
rcu_read_unlock(); \
- preempt_enable(); \
+ migrate_enable(); \
_ret; \
})
@@ -932,7 +932,7 @@ int bpf_prog_array_copy(struct bpf_prog_
u32 ret; \
u32 _ret = 1; \
u32 _cn = 0; \
- preempt_disable(); \
+ migrate_disable(); \
rcu_read_lock(); \
_array = rcu_dereference(array); \
_item = &_array->items[0]; \
@@ -944,7 +944,7 @@ int bpf_prog_array_copy(struct bpf_prog_
_item++; \
} \
rcu_read_unlock(); \
- preempt_enable(); \
+ migrate_enable(); \
if (_ret) \
_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS); \
else \
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -677,6 +677,7 @@ static inline u8 *bpf_skb_cb(struct sk_b
return qdisc_skb_cb(skb)->data;
}
+/* Must be invoked with migration disabled */
static inline u32 __bpf_prog_run_save_cb(const struct bpf_prog *prog,
struct sk_buff *skb)
{
@@ -702,9 +703,9 @@ static inline u32 bpf_prog_run_save_cb(c
{
u32 res;
- preempt_disable();
+ migrate_disable();
res = __bpf_prog_run_save_cb(prog, skb);
- preempt_enable();
+ migrate_enable();
return res;
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 14/20] bpf: Use migrate_disable() in hashtab code
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (12 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 13/20] bpf: Use migrate_disable/enable in array macros and cgroup/lirc code Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 15/20] bpf: Provide recursion prevention helpers Thomas Gleixner
` (5 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
The required protection is that the caller cannot be migrated to a
different CPU as these places take either a hash bucket lock or might
trigger a kprobe inside the memory allocator. Both scenarios can lead to
deadlocks. The deadlock prevention is per CPU by incrementing a per CPU
variable which temporarily blocks the invocation of BPF programs from perf
and kprobes.
Replace the preempt_disable/enable() pairs with migrate_disable/enable()
pairs to prepare BPF to work on PREEMPT_RT enabled kernels. On a non-RT
kernel this maps to preempt_disable/enable(), i.e. no functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/bpf/hashtab.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1319,7 +1319,7 @@ static int
}
again:
- preempt_disable();
+ migrate_disable();
this_cpu_inc(bpf_prog_active);
rcu_read_lock();
again_nocopy:
@@ -1339,7 +1339,7 @@ static int
raw_spin_unlock_irqrestore(&b->lock, flags);
rcu_read_unlock();
this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ migrate_enable();
goto after_loop;
}
@@ -1348,7 +1348,7 @@ static int
raw_spin_unlock_irqrestore(&b->lock, flags);
rcu_read_unlock();
this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ migrate_enable();
kvfree(keys);
kvfree(values);
goto alloc;
@@ -1398,7 +1398,7 @@ static int
rcu_read_unlock();
this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ migrate_enable();
if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys,
key_size * bucket_cnt) ||
copy_to_user(uvalues + total * value_size, values,
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 15/20] bpf: Provide recursion prevention helpers
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (13 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 14/20] bpf: Use migrate_disable() in hashtab code Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 16/20] bpf: Replace open coded recursion prevention Thomas Gleixner
` (4 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
The places which need to prevent the execution of trace type BPF programs
to prevent deadlocks on the hash bucket lock do this open coded.
Provide two inline functions, bpf_disable/enable_instrumentation() to
replace these open coded protection constructs.
Use migrate_disable/enable() instead of preempt_disable/enable() right away
so this works on RT enabled kernels. On a !RT kernel migrate_disable /
enable() are mapped to preempt_disable/enable().
These helpers use this_cpu_inc/dec() instead of __this_cpu_inc/dec() on an
RT enabled kernel because migrate disabled regions are preemptible and
preemption might hit in the middle of a RMW operation which can lead to
inconsistent state.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch. Use this_cpu_inc/dec() as pointed out by Mathieu.
---
include/linux/bpf.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -961,6 +961,36 @@ int bpf_prog_array_copy(struct bpf_prog_
#ifdef CONFIG_BPF_SYSCALL
DECLARE_PER_CPU(int, bpf_prog_active);
+/*
+ * Block execution of BPF programs attached to instrumentation (perf,
+ * kprobes, tracepoints) to prevent deadlocks on map operations as any of
+ * these events can happen inside a region which holds a map bucket lock
+ * and can deadlock on it.
+ *
+ * Use the preemption safe inc/dec variants on RT because migrate disable
+ * is preemptible on RT and preemption in the middle of the RMW operation
+ * might lead to inconsistent state. Use the raw variants for non RT
+ * kernels as migrate_disable() maps to preempt_disable() so the slightly
+ * more expensive save operation can be avoided.
+ */
+static inline void bpf_disable_instrumentation(void)
+{
+ migrate_disable();
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ this_cpu_inc(bpf_prog_active);
+ else
+ __this_cpu_inc(bpf_prog_active);
+}
+
+static inline void bpf_enable_instrumentation(void)
+{
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ this_cpu_dec(bpf_prog_active);
+ else
+ __this_cpu_dec(bpf_prog_active);
+ migrate_enable();
+}
+
extern const struct file_operations bpf_map_fops;
extern const struct file_operations bpf_prog_fops;
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 16/20] bpf: Replace open coded recursion prevention
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (14 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 15/20] bpf: Provide recursion prevention helpers Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 17/20] bpf: Factor out hashtab bucket lock operations Thomas Gleixner
` (3 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
The required protection is that the caller cannot be migrated to a
different CPU as these functions end up in places which take either a hash
bucket lock or might trigger a kprobe inside the memory allocator. Both
scenarios can lead to deadlocks. The deadlock prevention is per CPU by
incrementing a per CPU variable which temporarily blocks the invocation of
BPF programs from perf and kprobes.
Replace the open coded preempt_[dis|en]able and __this_cpu_[inc|dec] pairs
with the new helper functions. These functions are already prepared to
make BPF work on PREEMPT_RT enabled kernels.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
kernel/bpf/hashtab.c | 12 ++++--------
kernel/bpf/syscall.c | 27 ++++++++-------------------
2 files changed, 12 insertions(+), 27 deletions(-)
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1319,8 +1319,7 @@ static int
}
again:
- migrate_disable();
- this_cpu_inc(bpf_prog_active);
+ bpf_disable_instrumentation();
rcu_read_lock();
again_nocopy:
dst_key = keys;
@@ -1338,8 +1337,7 @@ static int
ret = -ENOSPC;
raw_spin_unlock_irqrestore(&b->lock, flags);
rcu_read_unlock();
- this_cpu_dec(bpf_prog_active);
- migrate_enable();
+ bpf_enable_instrumentation();
goto after_loop;
}
@@ -1347,8 +1345,7 @@ static int
bucket_size = bucket_cnt;
raw_spin_unlock_irqrestore(&b->lock, flags);
rcu_read_unlock();
- this_cpu_dec(bpf_prog_active);
- migrate_enable();
+ bpf_enable_instrumentation();
kvfree(keys);
kvfree(values);
goto alloc;
@@ -1397,8 +1394,7 @@ static int
}
rcu_read_unlock();
- this_cpu_dec(bpf_prog_active);
- migrate_enable();
+ bpf_enable_instrumentation();
if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys,
key_size * bucket_cnt) ||
copy_to_user(uvalues + total * value_size, values,
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -171,11 +171,7 @@ static int bpf_map_update_value(struct b
flags);
}
- /* must increment bpf_prog_active to avoid kprobe+bpf triggering from
- * inside bpf map update or delete otherwise deadlocks are possible
- */
- preempt_disable();
- __this_cpu_inc(bpf_prog_active);
+ bpf_disable_instrumentation();
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
err = bpf_percpu_hash_update(map, key, value, flags);
@@ -206,8 +202,7 @@ static int bpf_map_update_value(struct b
err = map->ops->map_update_elem(map, key, value, flags);
rcu_read_unlock();
}
- __this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ bpf_enable_instrumentation();
maybe_wait_bpf_programs(map);
return err;
@@ -222,8 +217,7 @@ static int bpf_map_copy_value(struct bpf
if (bpf_map_is_dev_bound(map))
return bpf_map_offload_lookup_elem(map, key, value);
- preempt_disable();
- this_cpu_inc(bpf_prog_active);
+ bpf_disable_instrumentation();
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
err = bpf_percpu_hash_copy(map, key, value);
@@ -268,8 +262,7 @@ static int bpf_map_copy_value(struct bpf
rcu_read_unlock();
}
- this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ bpf_enable_instrumentation();
maybe_wait_bpf_programs(map);
return err;
@@ -1136,13 +1129,11 @@ static int map_delete_elem(union bpf_att
goto out;
}
- preempt_disable();
- __this_cpu_inc(bpf_prog_active);
+ bpf_disable_instrumentation();
rcu_read_lock();
err = map->ops->map_delete_elem(map, key);
rcu_read_unlock();
- __this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ bpf_enable_instrumentation();
maybe_wait_bpf_programs(map);
out:
kfree(key);
@@ -1254,13 +1245,11 @@ int generic_map_delete_batch(struct bpf_
break;
}
- preempt_disable();
- __this_cpu_inc(bpf_prog_active);
+ bpf_disable_instrumentation();
rcu_read_lock();
err = map->ops->map_delete_elem(map, key);
rcu_read_unlock();
- __this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ bpf_enable_instrumentation();
maybe_wait_bpf_programs(map);
if (err)
break;
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 17/20] bpf: Factor out hashtab bucket lock operations
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (15 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 16/20] bpf: Replace open coded recursion prevention Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 18/20] bpf: Prepare hashtab locking for PREEMPT_RT Thomas Gleixner
` (2 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
As a preparation for making the BPF locking RT friendly, factor out the
hash bucket lock operations into inline functions. This allows to do the
necessary RT modification in one place instead of sprinkling it all over
the place. No functional change.
The now unused htab argument of the lock/unlock functions will be used in
the next step which adds PREEMPT_RT support.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/bpf/hashtab.c | 69 ++++++++++++++++++++++++++++++++++-----------------
1 file changed, 46 insertions(+), 23 deletions(-)
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -87,6 +87,32 @@ struct htab_elem {
char key[0] __aligned(8);
};
+static void htab_init_buckets(struct bpf_htab *htab)
+{
+ unsigned i;
+
+ for (i = 0; i < htab->n_buckets; i++) {
+ INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
+ raw_spin_lock_init(&htab->buckets[i].lock);
+ }
+}
+
+static inline unsigned long htab_lock_bucket(const struct bpf_htab *htab,
+ struct bucket *b)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&b->lock, flags);
+ return flags;
+}
+
+static inline void htab_unlock_bucket(const struct bpf_htab *htab,
+ struct bucket *b,
+ unsigned long flags)
+{
+ raw_spin_unlock_irqrestore(&b->lock, flags);
+}
+
static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
static bool htab_is_lru(const struct bpf_htab *htab)
@@ -336,8 +362,8 @@ static struct bpf_map *htab_map_alloc(un
bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
struct bpf_htab *htab;
- int err, i;
u64 cost;
+ int err;
htab = kzalloc(sizeof(*htab), GFP_USER);
if (!htab)
@@ -399,10 +425,7 @@ static struct bpf_map *htab_map_alloc(un
else
htab->hashrnd = get_random_int();
- for (i = 0; i < htab->n_buckets; i++) {
- INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
- raw_spin_lock_init(&htab->buckets[i].lock);
- }
+ htab_init_buckets(htab);
if (prealloc) {
err = prealloc_init(htab);
@@ -610,7 +633,7 @@ static bool htab_lru_map_delete_node(voi
b = __select_bucket(htab, tgt_l->hash);
head = &b->head;
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
if (l == tgt_l) {
@@ -618,7 +641,7 @@ static bool htab_lru_map_delete_node(voi
break;
}
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
return l == tgt_l;
}
@@ -884,7 +907,7 @@ static int htab_map_update_elem(struct b
*/
}
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -925,7 +948,7 @@ static int htab_map_update_elem(struct b
}
ret = 0;
err:
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
return ret;
}
@@ -963,7 +986,7 @@ static int htab_lru_map_update_elem(stru
return -ENOMEM;
memcpy(l_new->key + round_up(map->key_size, 8), value, map->value_size);
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -982,7 +1005,7 @@ static int htab_lru_map_update_elem(stru
ret = 0;
err:
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
if (ret)
bpf_lru_push_free(&htab->lru, &l_new->lru_node);
@@ -1017,7 +1040,7 @@ static int __htab_percpu_map_update_elem
b = __select_bucket(htab, hash);
head = &b->head;
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -1040,7 +1063,7 @@ static int __htab_percpu_map_update_elem
}
ret = 0;
err:
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
return ret;
}
@@ -1080,7 +1103,7 @@ static int __htab_lru_percpu_map_update_
return -ENOMEM;
}
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -1102,7 +1125,7 @@ static int __htab_lru_percpu_map_update_
}
ret = 0;
err:
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
if (l_new)
bpf_lru_push_free(&htab->lru, &l_new->lru_node);
return ret;
@@ -1140,7 +1163,7 @@ static int htab_map_delete_elem(struct b
b = __select_bucket(htab, hash);
head = &b->head;
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l = lookup_elem_raw(head, hash, key, key_size);
@@ -1150,7 +1173,7 @@ static int htab_map_delete_elem(struct b
ret = 0;
}
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
return ret;
}
@@ -1172,7 +1195,7 @@ static int htab_lru_map_delete_elem(stru
b = __select_bucket(htab, hash);
head = &b->head;
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l = lookup_elem_raw(head, hash, key, key_size);
@@ -1181,7 +1204,7 @@ static int htab_lru_map_delete_elem(stru
ret = 0;
}
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
if (l)
bpf_lru_push_free(&htab->lru, &l->lru_node);
return ret;
@@ -1326,7 +1349,7 @@ static int
dst_val = values;
b = &htab->buckets[batch];
head = &b->head;
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
bucket_cnt = 0;
hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
@@ -1335,7 +1358,7 @@ static int
if (bucket_cnt > (max_count - total)) {
if (total == 0)
ret = -ENOSPC;
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
rcu_read_unlock();
bpf_enable_instrumentation();
goto after_loop;
@@ -1343,7 +1366,7 @@ static int
if (bucket_cnt > bucket_size) {
bucket_size = bucket_cnt;
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
rcu_read_unlock();
bpf_enable_instrumentation();
kvfree(keys);
@@ -1384,7 +1407,7 @@ static int
dst_val += value_size;
}
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
/* If we are not copying data, we can go to next bucket and avoid
* unlocking the rcu.
*/
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 18/20] bpf: Prepare hashtab locking for PREEMPT_RT
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (16 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 17/20] bpf: Factor out hashtab bucket lock operations Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 19/20] bpf, lpm: Make locking RT friendly Thomas Gleixner
2020-02-20 20:45 ` [patch V2 20/20] bpf/stackmap: Dont trylock mmap_sem with PREEMPT_RT and interrupts disabled Thomas Gleixner
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
PREEMPT_RT forbids certain operations like memory allocations (even with
GFP_ATOMIC) from atomic contexts. This is required because even with
GFP_ATOMIC the memory allocator calls into code pathes which acquire locks
with long held lock sections. To ensure the deterministic behaviour these
locks are regular spinlocks, which are converted to 'sleepable' spinlocks
on RT. The only true atomic contexts on an RT kernel are the low level
hardware handling, scheduling, low level interrupt handling, NMIs etc. None
of these contexts should ever do memory allocations.
As regular device interrupt handlers and soft interrupts are forced into
thread context, the existing code which does
spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
just works.
In theory the BPF locks could be converted to regular spinlocks as well,
but the bucket locks and percpu_freelist locks can be taken from arbitrary
contexts (perf, kprobes, tracepoints) which are required to be atomic
contexts even on RT. These mechanisms require preallocated maps, so there
is no need to invoke memory allocations within the lock held sections.
BPF maps which need dynamic allocation are only used from (forced) thread
context on RT and can therefore use regular spinlocks which in turn allows
to invoke memory allocations from the lock held section.
To achieve this make the hash bucket lock a union of a raw and a regular
spinlock and initialize and lock/unlock either the raw spinlock for
preallocated maps or the regular variant for maps which require memory
allocations.
On a non RT kernel this distinction is neither possible nor required.
spinlock maps to raw_spinlock and the extra code and conditional is
optimized out by the compiler. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/bpf/hashtab.c | 65 +++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 56 insertions(+), 9 deletions(-)
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -46,10 +46,43 @@
* from one of these contexts completed. sys_bpf() uses the same mechanism
* by pinning the task to the current CPU and incrementing the recursion
* protection accross the map operation.
+ *
+ * This has subtle implications on PREEMPT_RT. PREEMPT_RT forbids certain
+ * operations like memory allocations (even with GFP_ATOMIC) from atomic
+ * contexts. This is required because even with GFP_ATOMIC the memory
+ * allocator calls into code pathes which acquire locks with long held lock
+ * sections. To ensure the deterministic behaviour these locks are regular
+ * spinlocks, which are converted to 'sleepable' spinlocks on RT. The only
+ * true atomic contexts on an RT kernel are the low level hardware
+ * handling, scheduling, low level interrupt handling, NMIs etc. None of
+ * these contexts should ever do memory allocations.
+ *
+ * As regular device interrupt handlers and soft interrupts are forced into
+ * thread context, the existing code which does
+ * spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
+ * just works.
+ *
+ * In theory the BPF locks could be converted to regular spinlocks as well,
+ * but the bucket locks and percpu_freelist locks can be taken from
+ * arbitrary contexts (perf, kprobes, tracepoints) which are required to be
+ * atomic contexts even on RT. These mechanisms require preallocated maps,
+ * so there is no need to invoke memory allocations within the lock held
+ * sections.
+ *
+ * BPF maps which need dynamic allocation are only used from (forced)
+ * thread context on RT and can therefore use regular spinlocks which in
+ * turn allows to invoke memory allocations from the lock held section.
+ *
+ * On a non RT kernel this distinction is neither possible nor required.
+ * spinlock maps to raw_spinlock and the extra code is optimized out by the
+ * compiler.
*/
struct bucket {
struct hlist_nulls_head head;
- raw_spinlock_t lock;
+ union {
+ raw_spinlock_t raw_lock;
+ spinlock_t lock;
+ };
};
struct bpf_htab {
@@ -87,13 +120,26 @@ struct htab_elem {
char key[0] __aligned(8);
};
+static inline bool htab_is_prealloc(const struct bpf_htab *htab)
+{
+ return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
+}
+
+static inline bool htab_use_raw_lock(const struct bpf_htab *htab)
+{
+ return (!IS_ENABLED(CONFIG_PREEMPT_RT) || htab_is_prealloc(htab));
+}
+
static void htab_init_buckets(struct bpf_htab *htab)
{
unsigned i;
for (i = 0; i < htab->n_buckets; i++) {
INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
- raw_spin_lock_init(&htab->buckets[i].lock);
+ if (htab_use_raw_lock(htab))
+ raw_spin_lock_init(&htab->buckets[i].raw_lock);
+ else
+ spin_lock_init(&htab->buckets[i].lock);
}
}
@@ -102,7 +148,10 @@ static inline unsigned long htab_lock_bu
{
unsigned long flags;
- raw_spin_lock_irqsave(&b->lock, flags);
+ if (htab_use_raw_lock(htab))
+ raw_spin_lock_irqsave(&b->raw_lock, flags);
+ else
+ spin_lock_irqsave(&b->lock, flags);
return flags;
}
@@ -110,7 +159,10 @@ static inline void htab_unlock_bucket(co
struct bucket *b,
unsigned long flags)
{
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ if (htab_use_raw_lock(htab))
+ raw_spin_unlock_irqrestore(&b->raw_lock, flags);
+ else
+ spin_unlock_irqrestore(&b->lock, flags);
}
static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
@@ -127,11 +179,6 @@ static bool htab_is_percpu(const struct
htab->map.map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH;
}
-static bool htab_is_prealloc(const struct bpf_htab *htab)
-{
- return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
-}
-
static inline void htab_elem_set_ptr(struct htab_elem *l, u32 key_size,
void __percpu *pptr)
{
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 19/20] bpf, lpm: Make locking RT friendly
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (17 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 18/20] bpf: Prepare hashtab locking for PREEMPT_RT Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
2020-02-20 20:45 ` [patch V2 20/20] bpf/stackmap: Dont trylock mmap_sem with PREEMPT_RT and interrupts disabled Thomas Gleixner
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
The LPM trie map cannot be used in contexts like perf, kprobes and tracing
as this map type dynamically allocates memory.
The memory allocation happens with a raw spinlock held which is a truly
spinning lock on a PREEMPT RT enabled kernel which disables preemption and
interrupts.
As RT does not allow memory allocation from such a section for various
reasons, convert the raw spinlock to a regular spinlock.
On a RT enabled kernel these locks are substituted by 'sleeping' spinlocks
which provide the proper protection but keep the code preemptible.
On a non-RT kernel regular spinlocks map to raw spinlocks, i.e. this does
not cause any functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/bpf/lpm_trie.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -34,7 +34,7 @@ struct lpm_trie {
size_t n_entries;
size_t max_prefixlen;
size_t data_size;
- raw_spinlock_t lock;
+ spinlock_t lock;
};
/* This trie implements a longest prefix match algorithm that can be used to
@@ -315,7 +315,7 @@ static int trie_update_elem(struct bpf_m
if (key->prefixlen > trie->max_prefixlen)
return -EINVAL;
- raw_spin_lock_irqsave(&trie->lock, irq_flags);
+ spin_lock_irqsave(&trie->lock, irq_flags);
/* Allocate and fill a new node */
@@ -422,7 +422,7 @@ static int trie_update_elem(struct bpf_m
kfree(im_node);
}
- raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
+ spin_unlock_irqrestore(&trie->lock, irq_flags);
return ret;
}
@@ -442,7 +442,7 @@ static int trie_delete_elem(struct bpf_m
if (key->prefixlen > trie->max_prefixlen)
return -EINVAL;
- raw_spin_lock_irqsave(&trie->lock, irq_flags);
+ spin_lock_irqsave(&trie->lock, irq_flags);
/* Walk the tree looking for an exact key/length match and keeping
* track of the path we traverse. We will need to know the node
@@ -518,7 +518,7 @@ static int trie_delete_elem(struct bpf_m
kfree_rcu(node, rcu);
out:
- raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
+ spin_unlock_irqrestore(&trie->lock, irq_flags);
return ret;
}
@@ -575,7 +575,7 @@ static struct bpf_map *trie_alloc(union
if (ret)
goto out_err;
- raw_spin_lock_init(&trie->lock);
+ spin_lock_init(&trie->lock);
return &trie->map;
out_err:
^ permalink raw reply [flat|nested] 25+ messages in thread
* [patch V2 20/20] bpf/stackmap: Dont trylock mmap_sem with PREEMPT_RT and interrupts disabled
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
` (18 preceding siblings ...)
2020-02-20 20:45 ` [patch V2 19/20] bpf, lpm: Make locking RT friendly Thomas Gleixner
@ 2020-02-20 20:45 ` Thomas Gleixner
19 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-20 20:45 UTC (permalink / raw)
To: LKML
Cc: David Miller, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
Sebastian Sewior, Peter Zijlstra, Clark Williams, Steven Rostedt,
Juri Lelli, Ingo Molnar, Mathieu Desnoyers, Vinicius Costa Gomes,
Jakub Kicinski
From: David Miller <davem@davemloft.net>
In a RT kernel down_read_trylock() cannot be used from NMI context and
up_read_non_owner() is another problematic issue.
So in such a configuration, simply elide the annotated stackmap and
just report the raw IPs.
In the longer term, it might be possible to provide a atomic friendly
versions of the page cache traversal which will at least provide the info
if the pages are resident and don't need to be paged in.
[ tglx: Use IS_ENABLED() to avoid the #ifdeffery, fixup the irq work
callback and add a comment ]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/bpf/stackmap.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -40,6 +40,9 @@ static void do_up_read(struct irq_work *
{
struct stack_map_irq_work *work;
+ if (WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT)))
+ return;
+
work = container_of(entry, struct stack_map_irq_work, irq_work);
up_read_non_owner(work->sem);
work->sem = NULL;
@@ -288,10 +291,19 @@ static void stack_map_get_build_id_offse
struct stack_map_irq_work *work = NULL;
if (irqs_disabled()) {
- work = this_cpu_ptr(&up_read_work);
- if (atomic_read(&work->irq_work.flags) & IRQ_WORK_BUSY)
- /* cannot queue more up_read, fallback */
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ work = this_cpu_ptr(&up_read_work);
+ if (atomic_read(&work->irq_work.flags) & IRQ_WORK_BUSY) {
+ /* cannot queue more up_read, fallback */
+ irq_work_busy = true;
+ }
+ } else {
+ /*
+ * PREEMPT_RT does not allow to trylock mmap sem in
+ * interrupt disabled context. Force the fallback code.
+ */
irq_work_busy = true;
+ }
}
/*
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs
2020-02-20 20:45 ` [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs Thomas Gleixner
@ 2020-02-22 4:29 ` Alexei Starovoitov
2020-02-22 8:40 ` Thomas Gleixner
2020-02-22 16:44 ` kbuild test robot
1 sibling, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2020-02-22 4:29 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, David Miller, bpf, netdev, Alexei Starovoitov,
Daniel Borkmann, Sebastian Sewior, Peter Zijlstra,
Clark Williams, Steven Rostedt, Juri Lelli, Ingo Molnar,
Mathieu Desnoyers, Vinicius Costa Gomes, Jakub Kicinski
On Thu, Feb 20, 2020 at 09:45:18PM +0100, Thomas Gleixner wrote:
> The assumption that only programs attached to perf NMI events can deadlock
> on memory allocators is wrong. Assume the following simplified callchain:
>
> kmalloc() from regular non BPF context
> cache empty
> freelist empty
> lock(zone->lock);
> tracepoint or kprobe
> BPF()
> update_elem()
> lock(bucket)
> kmalloc()
> cache empty
> freelist empty
> lock(zone->lock); <- DEADLOCK
>
> There are other ways which do not involve locking to create wreckage:
>
> kmalloc() from regular non BPF context
> local_irq_save();
> ...
> obj = percpu_slab_first();
> kprobe()
> BPF()
> update_elem()
> lock(bucket)
> kmalloc()
> local_irq_save();
> ...
> obj = percpu_slab_first(); <- Same object as above ...
>
> So preallocation _must_ be enforced for all variants of intrusive
> instrumentation.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: New patch
> ---
> kernel/bpf/verifier.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8144,19 +8144,23 @@ static int check_map_prog_compatibility(
> struct bpf_prog *prog)
>
> {
> - /* Make sure that BPF_PROG_TYPE_PERF_EVENT programs only use
> - * preallocated hash maps, since doing memory allocation
> - * in overflow_handler can crash depending on where nmi got
> - * triggered.
> + /*
> + * Make sure that trace type programs only use preallocated hash
> + * maps. Perf programs obviously can't do memory allocation in NMI
> + * context and all other types can deadlock on a memory allocator
> + * lock when a tracepoint/kprobe triggers a BPF program inside a
> + * lock held region or create inconsistent state when the probe is
> + * within an interrupts disabled critical region in the memory
> + * allocator.
> */
> - if (prog->type == BPF_PROG_TYPE_PERF_EVENT) {
> + if ((is_tracing_prog_type(prog->type)) {
This doesn't build.
I assumed the typo somehow sneaked in and proceeded, but it broke
a bunch of tests:
Summary: 1526 PASSED, 0 SKIPPED, 54 FAILED
One can argue that the test are unsafe and broken.
We used to test all those tests with and without prealloc:
map_flags = 0;
run_all_tests();
map_flags = BPF_F_NO_PREALLOC;
run_all_tests();
Then 4 years ago commit 5aa5bd14c5f866 switched hashmap to be no_prealloc
always and that how it stayed since then. We can adjust the tests to use
prealloc with tracing progs, but this breakage shows that there could be plenty
of bpf users that also use BPF_F_NO_PREALLOC with tracing. It could simply
be because they know that their kprobes are in a safe spot (and kmalloc is ok)
and they want to save memory. They could be using large max_entries parameter
for worst case hash map usage, but typical load is low. In general hashtables
don't perform well after 50%, so prealloc is wasting half of the memory. Since
we cannot control where kprobes are placed I'm not sure what is the right fix
here. It feels that if we proceed with this patch somebody will complain and we
would have to revert, but I'm willing to take this risk if we cannot come up
with an alternative fix.
Going further with the patchset.
Patch 9 "bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites."
adds new warning:
../kernel/seccomp.c: In function ‘seccomp_run_filters’:
../kernel/seccomp.c:272:50: warning: passing argument 2 of ‘bpf_prog_run_pin_on_cpu’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
u32 cur_ret = bpf_prog_run_pin_on_cpu(f->prog, sd);
I fixed it up and proceeded, but patch 16 failed to apply:
Applying: bpf: Factor out hashtab bucket lock operations
error: sha1 information is lacking or useless (kernel/bpf/hashtab.c).
error: could not build fake ancestor
Patch failed at 0001 bpf: Factor out hashtab bucket lock operations
I patched it in manually:
patch -p1 < a.patch
patching file kernel/bpf/hashtab.c
Hunk #1 succeeded at 1333 (offset 14 lines).
Hunk #2 succeeded at 1361 with fuzz 1 (offset 24 lines).
Hunk #3 succeeded at 1372 with fuzz 1 (offset 27 lines).
Hunk #4 succeeded at 1442 (offset 48 lines).
patching file kernel/bpf/syscall.c
and it looks correct.
But patch 17 failed completely:
patch -p1 < b.patch
patching file kernel/bpf/hashtab.c
Hunk #1 succeeded at 88 (offset 1 line).
Hunk #2 succeeded at 374 (offset 12 lines).
Hunk #3 succeeded at 437 (offset 12 lines).
Hunk #4 succeeded at 645 (offset 12 lines).
Hunk #5 succeeded at 653 (offset 12 lines).
Hunk #6 succeeded at 919 (offset 12 lines).
Hunk #7 succeeded at 960 (offset 12 lines).
Hunk #8 succeeded at 998 (offset 12 lines).
Hunk #9 succeeded at 1017 (offset 12 lines).
Hunk #10 succeeded at 1052 (offset 12 lines).
Hunk #11 succeeded at 1075 (offset 12 lines).
Hunk #12 succeeded at 1115 (offset 12 lines).
Hunk #13 succeeded at 1137 (offset 12 lines).
Hunk #14 succeeded at 1175 (offset 12 lines).
Hunk #15 succeeded at 1185 (offset 12 lines).
Hunk #16 succeeded at 1207 (offset 12 lines).
Hunk #17 succeeded at 1216 (offset 12 lines).
Hunk #18 FAILED at 1349.
Hunk #19 FAILED at 1358.
Hunk #20 FAILED at 1366.
Hunk #21 FAILED at 1407.
4 out of 21 hunks FAILED -- saving rejects to file kernel/bpf/hashtab.c.rej
That's where I gave up.
I pulled sched-for-bpf-2020-02-20 branch from tip and pushed it into bpf-next.
Could you please rebase your set on top of bpf-next and repost?
The logic in all patches looks good.
For now I propose to drop patch 1 and get the rest merged while we're
figuring out what to do.
Thanks!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs
2020-02-22 4:29 ` Alexei Starovoitov
@ 2020-02-22 8:40 ` Thomas Gleixner
2020-02-23 22:40 ` Alexei Starovoitov
0 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2020-02-22 8:40 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: LKML, David Miller, bpf, netdev, Alexei Starovoitov,
Daniel Borkmann, Sebastian Sewior, Peter Zijlstra,
Clark Williams, Steven Rostedt, Juri Lelli, Ingo Molnar,
Mathieu Desnoyers, Vinicius Costa Gomes, Jakub Kicinski
Alexei,
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Thu, Feb 20, 2020 at 09:45:18PM +0100, Thomas Gleixner wrote:
>> The assumption that only programs attached to perf NMI events can deadlock
>> on memory allocators is wrong. Assume the following simplified callchain:
>> */
>> - if (prog->type == BPF_PROG_TYPE_PERF_EVENT) {
>> + if ((is_tracing_prog_type(prog->type)) {
>
> This doesn't build.
> I assumed the typo somehow sneaked in and proceeded, but it broke
> a bunch of tests:
> Summary: 1526 PASSED, 0 SKIPPED, 54 FAILED
> One can argue that the test are unsafe and broken.
> We used to test all those tests with and without prealloc:
> map_flags = 0;
> run_all_tests();
> map_flags = BPF_F_NO_PREALLOC;
> run_all_tests();
> Then 4 years ago commit 5aa5bd14c5f866 switched hashmap to be no_prealloc
> always and that how it stayed since then. We can adjust the tests to use
> prealloc with tracing progs, but this breakage shows that there could be plenty
> of bpf users that also use BPF_F_NO_PREALLOC with tracing. It could simply
> be because they know that their kprobes are in a safe spot (and kmalloc is ok)
> and they want to save memory. They could be using large max_entries parameter
> for worst case hash map usage, but typical load is low. In general hashtables
> don't perform well after 50%, so prealloc is wasting half of the memory. Since
> we cannot control where kprobes are placed I'm not sure what is the right fix
> here. It feels that if we proceed with this patch somebody will complain and we
> would have to revert, but I'm willing to take this risk if we cannot come up
> with an alternative fix.
Having something which is known to be broken exposed is not a good option
either.
Just assume that someone is investigating a kernel issue. BOFH who is
stuck in the 90's uses perf, kprobes and tracepoints. Now he goes on
vacation and the new kid in the team decides to flip that over to BPF.
So now instead of getting information he deadlocks or crashes the
machine.
You can't just tell him, don't do that then. It's broken by design and
you really can't tell which probes are safe and which are not because
the allocator calls out into whatever functions which might look
completely unrelated.
So one way to phase this out would be:
if (is_tracing()) {
if (is_perf() || IS_ENABLED(RT))
return -EINVAL;
WARN_ONCE(.....)
}
And clearly write in the warning that this is dangerous, broken and
about to be forbidden. Hmm?
> Going further with the patchset.
>
> Patch 9 "bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites."
> adds new warning:
> ../kernel/seccomp.c: In function ‘seccomp_run_filters’:
> ../kernel/seccomp.c:272:50: warning: passing argument 2 of ‘bpf_prog_run_pin_on_cpu’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
> u32 cur_ret = bpf_prog_run_pin_on_cpu(f->prog, sd);
Uurgh. I'm sure I fixed that and then I must have lost it again while
reshuffling stuff. Sorry about that.
> That's where I gave up.
Fair enough.
> I pulled sched-for-bpf-2020-02-20 branch from tip and pushed it into bpf-next.
> Could you please rebase your set on top of bpf-next and repost?
> The logic in all patches looks good.
Will do.
Thanks,
tglx
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs
2020-02-20 20:45 ` [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs Thomas Gleixner
2020-02-22 4:29 ` Alexei Starovoitov
@ 2020-02-22 16:44 ` kbuild test robot
1 sibling, 0 replies; 25+ messages in thread
From: kbuild test robot @ 2020-02-22 16:44 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: kbuild-all, LKML
[-- Attachment #1: Type: text/plain, Size: 20677 bytes --]
Hi Thomas,
I love your patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
[also build test ERROR on bpf/master tip/auto-latest linux/master net-next/master net/master linus/master v5.6-rc2 next-20200221]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Thomas-Gleixner/bpf-Make-BPF-and-PREEMPT_RT-co-exist/20200222-080913
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: i386-randconfig-h003-20200222 (attached as .config)
compiler: gcc-7 (Debian 7.5.0-5) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All error/warnings (new ones prefixed by >>):
kernel/bpf/verifier.c: In function 'check_map_prog_compatibility':
>> kernel/bpf/verifier.c:10194:0: error: unterminated argument list invoking macro "if"
}
>> kernel/bpf/verifier.c:8160:2: error: expected '(' at end of input
if ((is_tracing_prog_type(prog->type)) {
^~
>> kernel/bpf/verifier.c:8160:2: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
kernel/bpf/verifier.c:10194:0: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if'
}
>> kernel/bpf/verifier.c:8160:2: error: expected declaration or statement at end of input
if ((is_tracing_prog_type(prog->type)) {
^~
kernel/bpf/verifier.c:8160:2: warning: no return statement in function returning non-void [-Wreturn-type]
At top level:
kernel/bpf/verifier.c:8146:12: warning: 'check_map_prog_compatibility' defined but not used [-Wunused-function]
static int check_map_prog_compatibility(struct bpf_verifier_env *env,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/bpf/verifier.c:8133:13: warning: 'is_tracing_prog_type' defined but not used [-Wunused-function]
static bool is_tracing_prog_type(enum bpf_prog_type type)
^~~~~~~~~~~~~~~~~~~~
kernel/bpf/verifier.c:8125:12: warning: 'check_map_prealloc' defined but not used [-Wunused-function]
static int check_map_prealloc(struct bpf_map *map)
^~~~~~~~~~~~~~~~~~
kernel/bpf/verifier.c:7803:12: warning: 'do_check' defined but not used [-Wunused-function]
static int do_check(struct bpf_verifier_env *env)
^~~~~~~~
kernel/bpf/verifier.c:6979:12: warning: 'check_btf_info' defined but not used [-Wunused-function]
static int check_btf_info(struct bpf_verifier_env *env,
^~~~~~~~~~~~~~
kernel/bpf/verifier.c:6841:13: warning: 'adjust_btf_func' defined but not used [-Wunused-function]
static void adjust_btf_func(struct bpf_verifier_env *env)
^~~~~~~~~~~~~~~
kernel/bpf/verifier.c:6602:12: warning: 'check_cfg' defined but not used [-Wunused-function]
static int check_cfg(struct bpf_verifier_env *env)
^~~~~~~~~
kernel/bpf/verifier.c:2723:12: warning: 'get_callee_stack_depth' defined but not used [-Wunused-function]
static int get_callee_stack_depth(struct bpf_verifier_env *env,
^~~~~~~~~~~~~~~~~~~~~~
kernel/bpf/verifier.c:2665:12: warning: 'check_max_stack_depth' defined but not used [-Wunused-function]
static int check_max_stack_depth(struct bpf_verifier_env *env)
^~~~~~~~~~~~~~~~~~~~~
kernel/bpf/verifier.c:1400:13: warning: 'insn_has_def32' defined but not used [-Wunused-function]
static bool insn_has_def32(struct bpf_verifier_env *env, struct bpf_insn *insn)
^~~~~~~~~~~~~~
kernel/bpf/verifier.c:1184:12: warning: 'check_subprogs' defined but not used [-Wunused-function]
static int check_subprogs(struct bpf_verifier_env *env)
^~~~~~~~~~~~~~
kernel/bpf/verifier.c:182:13: warning: 'bpf_map_ptr_poisoned' defined but not used [-Wunused-function]
static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
^~~~~~~~~~~~~~~~~~~~
vim +/if +10194 kernel/bpf/verifier.c
38207291604401 Martin KaFai Lau 2019-10-24 9991
838e96904ff3fc Yonghong Song 2018-11-19 9992 int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
838e96904ff3fc Yonghong Song 2018-11-19 9993 union bpf_attr __user *uattr)
51580e798cb61b Alexei Starovoitov 2014-09-26 9994 {
06ee7115b0d174 Alexei Starovoitov 2019-04-01 9995 u64 start_time = ktime_get_ns();
58e2af8b3a6b58 Jakub Kicinski 2016-09-21 9996 struct bpf_verifier_env *env;
b9193c1b61ddb9 Martin KaFai Lau 2018-03-24 9997 struct bpf_verifier_log *log;
9e4c24e7ee7dfd Jakub Kicinski 2019-01-22 9998 int i, len, ret = -EINVAL;
e2ae4ca266a1c9 Jakub Kicinski 2019-01-22 9999 bool is_priv;
51580e798cb61b Alexei Starovoitov 2014-09-26 10000
eba0c929d1d0f1 Arnd Bergmann 2017-11-02 10001 /* no program is valid */
eba0c929d1d0f1 Arnd Bergmann 2017-11-02 10002 if (ARRAY_SIZE(bpf_verifier_ops) == 0)
eba0c929d1d0f1 Arnd Bergmann 2017-11-02 10003 return -EINVAL;
eba0c929d1d0f1 Arnd Bergmann 2017-11-02 10004
58e2af8b3a6b58 Jakub Kicinski 2016-09-21 10005 /* 'struct bpf_verifier_env' can be global, but since it's not small,
cbd35700860492 Alexei Starovoitov 2014-09-26 10006 * allocate/free it every time bpf_check() is called
cbd35700860492 Alexei Starovoitov 2014-09-26 10007 */
58e2af8b3a6b58 Jakub Kicinski 2016-09-21 10008 env = kzalloc(sizeof(struct bpf_verifier_env), GFP_KERNEL);
cbd35700860492 Alexei Starovoitov 2014-09-26 10009 if (!env)
cbd35700860492 Alexei Starovoitov 2014-09-26 10010 return -ENOMEM;
61bd5218eef349 Jakub Kicinski 2017-10-09 10011 log = &env->log;
cbd35700860492 Alexei Starovoitov 2014-09-26 10012
9e4c24e7ee7dfd Jakub Kicinski 2019-01-22 10013 len = (*prog)->len;
fad953ce0b22cf Kees Cook 2018-06-12 10014 env->insn_aux_data =
9e4c24e7ee7dfd Jakub Kicinski 2019-01-22 10015 vzalloc(array_size(sizeof(struct bpf_insn_aux_data), len));
3df126f35f88dc Jakub Kicinski 2016-09-21 10016 ret = -ENOMEM;
3df126f35f88dc Jakub Kicinski 2016-09-21 10017 if (!env->insn_aux_data)
3df126f35f88dc Jakub Kicinski 2016-09-21 10018 goto err_free_env;
9e4c24e7ee7dfd Jakub Kicinski 2019-01-22 10019 for (i = 0; i < len; i++)
9e4c24e7ee7dfd Jakub Kicinski 2019-01-22 10020 env->insn_aux_data[i].orig_idx = i;
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10021 env->prog = *prog;
00176a34d9e27a Jakub Kicinski 2017-10-16 10022 env->ops = bpf_verifier_ops[env->prog->type];
45a73c17bfb92c Alexei Starovoitov 2019-04-19 10023 is_priv = capable(CAP_SYS_ADMIN);
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10024
8580ac9404f624 Alexei Starovoitov 2019-10-15 10025 if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
8580ac9404f624 Alexei Starovoitov 2019-10-15 10026 mutex_lock(&bpf_verifier_lock);
8580ac9404f624 Alexei Starovoitov 2019-10-15 10027 if (!btf_vmlinux)
8580ac9404f624 Alexei Starovoitov 2019-10-15 10028 btf_vmlinux = btf_parse_vmlinux();
8580ac9404f624 Alexei Starovoitov 2019-10-15 10029 mutex_unlock(&bpf_verifier_lock);
8580ac9404f624 Alexei Starovoitov 2019-10-15 10030 }
8580ac9404f624 Alexei Starovoitov 2019-10-15 10031
cbd35700860492 Alexei Starovoitov 2014-09-26 10032 /* grab the mutex to protect few globals used by verifier */
45a73c17bfb92c Alexei Starovoitov 2019-04-19 10033 if (!is_priv)
cbd35700860492 Alexei Starovoitov 2014-09-26 10034 mutex_lock(&bpf_verifier_lock);
cbd35700860492 Alexei Starovoitov 2014-09-26 10035
cbd35700860492 Alexei Starovoitov 2014-09-26 10036 if (attr->log_level || attr->log_buf || attr->log_size) {
cbd35700860492 Alexei Starovoitov 2014-09-26 10037 /* user requested verbose verifier output
cbd35700860492 Alexei Starovoitov 2014-09-26 10038 * and supplied buffer to store the verification trace
cbd35700860492 Alexei Starovoitov 2014-09-26 10039 */
e7bf8249e8f1ba Jakub Kicinski 2017-10-09 10040 log->level = attr->log_level;
e7bf8249e8f1ba Jakub Kicinski 2017-10-09 10041 log->ubuf = (char __user *) (unsigned long) attr->log_buf;
e7bf8249e8f1ba Jakub Kicinski 2017-10-09 10042 log->len_total = attr->log_size;
cbd35700860492 Alexei Starovoitov 2014-09-26 10043
cbd35700860492 Alexei Starovoitov 2014-09-26 10044 ret = -EINVAL;
e7bf8249e8f1ba Jakub Kicinski 2017-10-09 10045 /* log attributes have to be sane */
7a9f5c65abcc96 Alexei Starovoitov 2019-04-01 10046 if (log->len_total < 128 || log->len_total > UINT_MAX >> 2 ||
06ee7115b0d174 Alexei Starovoitov 2019-04-01 10047 !log->level || !log->ubuf || log->level & ~BPF_LOG_MASK)
3df126f35f88dc Jakub Kicinski 2016-09-21 10048 goto err_unlock;
cbd35700860492 Alexei Starovoitov 2014-09-26 10049 }
1ad2f5838d345e Daniel Borkmann 2017-05-25 10050
8580ac9404f624 Alexei Starovoitov 2019-10-15 10051 if (IS_ERR(btf_vmlinux)) {
8580ac9404f624 Alexei Starovoitov 2019-10-15 10052 /* Either gcc or pahole or kernel are broken. */
8580ac9404f624 Alexei Starovoitov 2019-10-15 10053 verbose(env, "in-kernel BTF is malformed\n");
8580ac9404f624 Alexei Starovoitov 2019-10-15 10054 ret = PTR_ERR(btf_vmlinux);
38207291604401 Martin KaFai Lau 2019-10-24 10055 goto skip_full_check;
8580ac9404f624 Alexei Starovoitov 2019-10-15 10056 }
8580ac9404f624 Alexei Starovoitov 2019-10-15 10057
1ad2f5838d345e Daniel Borkmann 2017-05-25 10058 env->strict_alignment = !!(attr->prog_flags & BPF_F_STRICT_ALIGNMENT);
1ad2f5838d345e Daniel Borkmann 2017-05-25 10059 if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
e07b98d9bffe41 David S. Miller 2017-05-10 10060 env->strict_alignment = true;
e9ee9efc0d1765 David Miller 2018-11-30 10061 if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
e9ee9efc0d1765 David Miller 2018-11-30 10062 env->strict_alignment = false;
cbd35700860492 Alexei Starovoitov 2014-09-26 10063
e2ae4ca266a1c9 Jakub Kicinski 2019-01-22 10064 env->allow_ptr_leaks = is_priv;
e2ae4ca266a1c9 Jakub Kicinski 2019-01-22 10065
10d274e880eb20 Alexei Starovoitov 2019-08-22 10066 if (is_priv)
10d274e880eb20 Alexei Starovoitov 2019-08-22 10067 env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;
10d274e880eb20 Alexei Starovoitov 2019-08-22 10068
f4e3ec0d573e23 Jakub Kicinski 2018-05-03 10069 ret = replace_map_fd_with_map_ptr(env);
f4e3ec0d573e23 Jakub Kicinski 2018-05-03 10070 if (ret < 0)
f4e3ec0d573e23 Jakub Kicinski 2018-05-03 10071 goto skip_full_check;
f4e3ec0d573e23 Jakub Kicinski 2018-05-03 10072
cae1927c0b4a93 Jakub Kicinski 2017-12-27 10073 if (bpf_prog_is_dev_bound(env->prog->aux)) {
a40a26322a83d4 Quentin Monnet 2018-11-09 10074 ret = bpf_prog_offload_verifier_prep(env->prog);
ab3f0063c48c26 Jakub Kicinski 2017-11-03 10075 if (ret)
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10076 goto skip_full_check;
f4e3ec0d573e23 Jakub Kicinski 2018-05-03 10077 }
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10078
dc2a4ebc0b44a2 Alexei Starovoitov 2019-05-21 10079 env->explored_states = kvcalloc(state_htab_size(env),
58e2af8b3a6b58 Jakub Kicinski 2016-09-21 10080 sizeof(struct bpf_verifier_state_list *),
f1bca824dabba4 Alexei Starovoitov 2014-09-29 10081 GFP_USER);
f1bca824dabba4 Alexei Starovoitov 2014-09-29 10082 ret = -ENOMEM;
f1bca824dabba4 Alexei Starovoitov 2014-09-29 10083 if (!env->explored_states)
f1bca824dabba4 Alexei Starovoitov 2014-09-29 10084 goto skip_full_check;
f1bca824dabba4 Alexei Starovoitov 2014-09-29 10085
d9762e84ede3ea Martin KaFai Lau 2018-12-13 10086 ret = check_subprogs(env);
475fb78fbf4859 Alexei Starovoitov 2014-09-26 10087 if (ret < 0)
475fb78fbf4859 Alexei Starovoitov 2014-09-26 10088 goto skip_full_check;
475fb78fbf4859 Alexei Starovoitov 2014-09-26 10089
c454a46b5efd8e Martin KaFai Lau 2018-12-07 10090 ret = check_btf_info(env, attr, uattr);
838e96904ff3fc Yonghong Song 2018-11-19 10091 if (ret < 0)
838e96904ff3fc Yonghong Song 2018-11-19 10092 goto skip_full_check;
838e96904ff3fc Yonghong Song 2018-11-19 10093
be8704ff07d237 Alexei Starovoitov 2020-01-20 10094 ret = check_attach_btf_id(env);
be8704ff07d237 Alexei Starovoitov 2020-01-20 10095 if (ret)
be8704ff07d237 Alexei Starovoitov 2020-01-20 10096 goto skip_full_check;
be8704ff07d237 Alexei Starovoitov 2020-01-20 10097
d9762e84ede3ea Martin KaFai Lau 2018-12-13 10098 ret = check_cfg(env);
d9762e84ede3ea Martin KaFai Lau 2018-12-13 10099 if (ret < 0)
d9762e84ede3ea Martin KaFai Lau 2018-12-13 10100 goto skip_full_check;
d9762e84ede3ea Martin KaFai Lau 2018-12-13 10101
51c39bb1d5d105 Alexei Starovoitov 2020-01-09 10102 ret = do_check_subprogs(env);
51c39bb1d5d105 Alexei Starovoitov 2020-01-09 10103 ret = ret ?: do_check_main(env);
cbd35700860492 Alexei Starovoitov 2014-09-26 10104
c941ce9c282cc6 Quentin Monnet 2018-10-07 10105 if (ret == 0 && bpf_prog_is_dev_bound(env->prog->aux))
c941ce9c282cc6 Quentin Monnet 2018-10-07 10106 ret = bpf_prog_offload_finalize(env);
c941ce9c282cc6 Quentin Monnet 2018-10-07 10107
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10108 skip_full_check:
51c39bb1d5d105 Alexei Starovoitov 2020-01-09 10109 kvfree(env->explored_states);
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10110
c131187db2d3fa Alexei Starovoitov 2017-11-22 10111 if (ret == 0)
9b38c4056b2736 Jakub Kicinski 2018-12-19 10112 ret = check_max_stack_depth(env);
c131187db2d3fa Alexei Starovoitov 2017-11-22 10113
9b38c4056b2736 Jakub Kicinski 2018-12-19 10114 /* instruction rewrites happen after this point */
e2ae4ca266a1c9 Jakub Kicinski 2019-01-22 10115 if (is_priv) {
e2ae4ca266a1c9 Jakub Kicinski 2019-01-22 10116 if (ret == 0)
e2ae4ca266a1c9 Jakub Kicinski 2019-01-22 10117 opt_hard_wire_dead_code_branches(env);
52875a04f4b26e Jakub Kicinski 2019-01-22 10118 if (ret == 0)
52875a04f4b26e Jakub Kicinski 2019-01-22 10119 ret = opt_remove_dead_code(env);
a1b14abc009d9c Jakub Kicinski 2019-01-22 10120 if (ret == 0)
a1b14abc009d9c Jakub Kicinski 2019-01-22 10121 ret = opt_remove_nops(env);
52875a04f4b26e Jakub Kicinski 2019-01-22 10122 } else {
70a87ffea8acc3 Alexei Starovoitov 2017-12-25 10123 if (ret == 0)
9b38c4056b2736 Jakub Kicinski 2018-12-19 10124 sanitize_dead_code(env);
52875a04f4b26e Jakub Kicinski 2019-01-22 10125 }
70a87ffea8acc3 Alexei Starovoitov 2017-12-25 10126
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10127 if (ret == 0)
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10128 /* program is valid, convert *(u32*)(ctx + off) accesses */
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10129 ret = convert_ctx_accesses(env);
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10130
e245c5c6a5656e Alexei Starovoitov 2017-03-15 10131 if (ret == 0)
79741b3bdec01a Alexei Starovoitov 2017-03-15 10132 ret = fixup_bpf_calls(env);
e245c5c6a5656e Alexei Starovoitov 2017-03-15 10133
a4b1d3c1ddf6cb Jiong Wang 2019-05-24 10134 /* do 32-bit optimization after insn patching has done so those patched
a4b1d3c1ddf6cb Jiong Wang 2019-05-24 10135 * insns could be handled correctly.
a4b1d3c1ddf6cb Jiong Wang 2019-05-24 10136 */
d6c2308c742a65 Jiong Wang 2019-05-24 10137 if (ret == 0 && !bpf_prog_is_dev_bound(env->prog->aux)) {
d6c2308c742a65 Jiong Wang 2019-05-24 10138 ret = opt_subreg_zext_lo32_rnd_hi32(env, attr);
d6c2308c742a65 Jiong Wang 2019-05-24 10139 env->prog->aux->verifier_zext = bpf_jit_needs_zext() ? !ret
d6c2308c742a65 Jiong Wang 2019-05-24 10140 : false;
a4b1d3c1ddf6cb Jiong Wang 2019-05-24 10141 }
a4b1d3c1ddf6cb Jiong Wang 2019-05-24 10142
1ea47e01ad6ea0 Alexei Starovoitov 2017-12-14 10143 if (ret == 0)
1ea47e01ad6ea0 Alexei Starovoitov 2017-12-14 10144 ret = fixup_call_args(env);
1ea47e01ad6ea0 Alexei Starovoitov 2017-12-14 10145
06ee7115b0d174 Alexei Starovoitov 2019-04-01 10146 env->verification_time = ktime_get_ns() - start_time;
06ee7115b0d174 Alexei Starovoitov 2019-04-01 10147 print_verification_stats(env);
06ee7115b0d174 Alexei Starovoitov 2019-04-01 10148
a2a7d570105254 Jakub Kicinski 2017-10-09 10149 if (log->level && bpf_verifier_log_full(log))
cbd35700860492 Alexei Starovoitov 2014-09-26 10150 ret = -ENOSPC;
a2a7d570105254 Jakub Kicinski 2017-10-09 10151 if (log->level && !log->ubuf) {
cbd35700860492 Alexei Starovoitov 2014-09-26 10152 ret = -EFAULT;
a2a7d570105254 Jakub Kicinski 2017-10-09 10153 goto err_release_maps;
cbd35700860492 Alexei Starovoitov 2014-09-26 10154 }
cbd35700860492 Alexei Starovoitov 2014-09-26 10155
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10156 if (ret == 0 && env->used_map_cnt) {
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10157 /* if program passed verifier, update used_maps in bpf_prog_info */
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10158 env->prog->aux->used_maps = kmalloc_array(env->used_map_cnt,
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10159 sizeof(env->used_maps[0]),
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10160 GFP_KERNEL);
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10161
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10162 if (!env->prog->aux->used_maps) {
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10163 ret = -ENOMEM;
a2a7d570105254 Jakub Kicinski 2017-10-09 10164 goto err_release_maps;
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10165 }
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10166
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10167 memcpy(env->prog->aux->used_maps, env->used_maps,
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10168 sizeof(env->used_maps[0]) * env->used_map_cnt);
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10169 env->prog->aux->used_map_cnt = env->used_map_cnt;
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10170
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10171 /* program is valid. Convert pseudo bpf_ld_imm64 into generic
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10172 * bpf_ld_imm64 instructions
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10173 */
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10174 convert_pseudo_ld_imm64(env);
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10175 }
cbd35700860492 Alexei Starovoitov 2014-09-26 10176
ba64e7d8525236 Yonghong Song 2018-11-24 10177 if (ret == 0)
ba64e7d8525236 Yonghong Song 2018-11-24 10178 adjust_btf_func(env);
ba64e7d8525236 Yonghong Song 2018-11-24 10179
a2a7d570105254 Jakub Kicinski 2017-10-09 10180 err_release_maps:
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10181 if (!env->prog->aux->used_maps)
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10182 /* if we didn't copy map pointers into bpf_prog_info, release
ab7f5bf0928be2 Jakub Kicinski 2018-05-03 10183 * them now. Otherwise free_used_maps() will release them.
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10184 */
0246e64d9a5fcd Alexei Starovoitov 2014-09-26 10185 release_maps(env);
9bac3d6d548e5c Alexei Starovoitov 2015-03-13 10186 *prog = env->prog;
3df126f35f88dc Jakub Kicinski 2016-09-21 10187 err_unlock:
45a73c17bfb92c Alexei Starovoitov 2019-04-19 10188 if (!is_priv)
cbd35700860492 Alexei Starovoitov 2014-09-26 10189 mutex_unlock(&bpf_verifier_lock);
3df126f35f88dc Jakub Kicinski 2016-09-21 10190 vfree(env->insn_aux_data);
3df126f35f88dc Jakub Kicinski 2016-09-21 10191 err_free_env:
3df126f35f88dc Jakub Kicinski 2016-09-21 10192 kfree(env);
51580e798cb61b Alexei Starovoitov 2014-09-26 10193 return ret;
51580e798cb61b Alexei Starovoitov 2014-09-26 @10194 }
:::::: The code at line 10194 was first introduced by commit
:::::: 51580e798cb61b0fc63fa3aa6c5c975375aa0550 bpf: verifier (add docs)
:::::: TO: Alexei Starovoitov <ast@plumgrid.com>
:::::: CC: David S. Miller <davem@davemloft.net>
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 39140 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs
2020-02-22 8:40 ` Thomas Gleixner
@ 2020-02-23 22:40 ` Alexei Starovoitov
0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2020-02-23 22:40 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, David Miller, bpf, netdev, Alexei Starovoitov,
Daniel Borkmann, Sebastian Sewior, Peter Zijlstra,
Clark Williams, Steven Rostedt, Juri Lelli, Ingo Molnar,
Mathieu Desnoyers, Vinicius Costa Gomes, Jakub Kicinski
On Sat, Feb 22, 2020 at 09:40:10AM +0100, Thomas Gleixner wrote:
> Alexei,
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > On Thu, Feb 20, 2020 at 09:45:18PM +0100, Thomas Gleixner wrote:
> >> The assumption that only programs attached to perf NMI events can deadlock
> >> on memory allocators is wrong. Assume the following simplified callchain:
> >> */
> >> - if (prog->type == BPF_PROG_TYPE_PERF_EVENT) {
> >> + if ((is_tracing_prog_type(prog->type)) {
> >
> > This doesn't build.
> > I assumed the typo somehow sneaked in and proceeded, but it broke
> > a bunch of tests:
> > Summary: 1526 PASSED, 0 SKIPPED, 54 FAILED
> > One can argue that the test are unsafe and broken.
> > We used to test all those tests with and without prealloc:
> > map_flags = 0;
> > run_all_tests();
> > map_flags = BPF_F_NO_PREALLOC;
> > run_all_tests();
> > Then 4 years ago commit 5aa5bd14c5f866 switched hashmap to be no_prealloc
> > always and that how it stayed since then. We can adjust the tests to use
> > prealloc with tracing progs, but this breakage shows that there could be plenty
> > of bpf users that also use BPF_F_NO_PREALLOC with tracing. It could simply
> > be because they know that their kprobes are in a safe spot (and kmalloc is ok)
> > and they want to save memory. They could be using large max_entries parameter
> > for worst case hash map usage, but typical load is low. In general hashtables
> > don't perform well after 50%, so prealloc is wasting half of the memory. Since
> > we cannot control where kprobes are placed I'm not sure what is the right fix
> > here. It feels that if we proceed with this patch somebody will complain and we
> > would have to revert, but I'm willing to take this risk if we cannot come up
> > with an alternative fix.
>
> Having something which is known to be broken exposed is not a good option
> either.
>
> Just assume that someone is investigating a kernel issue. BOFH who is
> stuck in the 90's uses perf, kprobes and tracepoints. Now he goes on
> vacation and the new kid in the team decides to flip that over to BPF.
> So now instead of getting information he deadlocks or crashes the
> machine.
>
> You can't just tell him, don't do that then. It's broken by design and
> you really can't tell which probes are safe and which are not because
> the allocator calls out into whatever functions which might look
> completely unrelated.
>
> So one way to phase this out would be:
>
> if (is_tracing()) {
> if (is_perf() || IS_ENABLED(RT))
> return -EINVAL;
> WARN_ONCE(.....)
> }
>
> And clearly write in the warning that this is dangerous, broken and
> about to be forbidden. Hmm?
Yeah. Let's start with WARN_ONCE and verbose(env, "dangerous, broken")
so the users see it in the verifier log and people who maintain
servers (like kernel-team-s in fb, goog, etc) see it as well
in their dmesg logs. So the motivation will be on both sides.
Then in few kernel releases we can flip it to disable.
Or we'll find a way to make it work without pre-allocating.
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2020-02-23 22:40 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
2020-02-20 20:45 ` [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs Thomas Gleixner
2020-02-22 4:29 ` Alexei Starovoitov
2020-02-22 8:40 ` Thomas Gleixner
2020-02-23 22:40 ` Alexei Starovoitov
2020-02-22 16:44 ` kbuild test robot
2020-02-20 20:45 ` [patch V2 02/20] bpf: Update locking comment in hashtab code Thomas Gleixner
2020-02-20 20:45 ` [patch V2 03/20] bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run() Thomas Gleixner
2020-02-20 20:45 ` [patch V2 04/20] perf/bpf: Remove preempt disable around BPF invocation Thomas Gleixner
2020-02-20 20:45 ` [patch V2 05/20] bpf: Remove recursion prevention from rcu free callback Thomas Gleixner
2020-02-20 20:45 ` [patch V2 06/20] bpf: Dont iterate over possible CPUs with interrupts disabled Thomas Gleixner
2020-02-20 20:45 ` [patch V2 07/20] bpf: Provide bpf_prog_run_pin_on_cpu() helper Thomas Gleixner
2020-02-20 20:45 ` [patch V2 08/20] bpf: Replace cant_sleep() with cant_migrate() Thomas Gleixner
2020-02-20 20:45 ` [patch V2 09/20] bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites Thomas Gleixner
2020-02-20 20:45 ` [patch V2 10/20] trace/bpf: Use migrate disable in trace_call_bpf() Thomas Gleixner
2020-02-20 20:45 ` [patch V2 11/20] bpf/tests: Use migrate disable instead of preempt disable Thomas Gleixner
2020-02-20 20:45 ` [patch V2 12/20] bpf: Use migrate_disable/enabe() in trampoline code Thomas Gleixner
2020-02-20 20:45 ` [patch V2 13/20] bpf: Use migrate_disable/enable in array macros and cgroup/lirc code Thomas Gleixner
2020-02-20 20:45 ` [patch V2 14/20] bpf: Use migrate_disable() in hashtab code Thomas Gleixner
2020-02-20 20:45 ` [patch V2 15/20] bpf: Provide recursion prevention helpers Thomas Gleixner
2020-02-20 20:45 ` [patch V2 16/20] bpf: Replace open coded recursion prevention Thomas Gleixner
2020-02-20 20:45 ` [patch V2 17/20] bpf: Factor out hashtab bucket lock operations Thomas Gleixner
2020-02-20 20:45 ` [patch V2 18/20] bpf: Prepare hashtab locking for PREEMPT_RT Thomas Gleixner
2020-02-20 20:45 ` [patch V2 19/20] bpf, lpm: Make locking RT friendly Thomas Gleixner
2020-02-20 20:45 ` [patch V2 20/20] bpf/stackmap: Dont trylock mmap_sem with PREEMPT_RT and interrupts disabled Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).