BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Clark Williams <williams@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [RFC patch 14/19] bpf: Use migrate_disable() in hashtab code
Date: Fri, 14 Feb 2020 14:11:26 -0500
Message-ID: <20200214191126.lbiusetaxecdl3of@localhost> (raw)
In-Reply-To: <20200214161504.325142160@linutronix.de>

On 14-Feb-2020 02:39:31 PM, Thomas Gleixner wrote:
> The required protection is that the caller cannot be migrated to a
> different CPU as these places take either a hash bucket lock or might
> trigger a kprobe inside the memory allocator. Both scenarios can lead to
> deadlocks. The deadlock prevention is per CPU by incrementing a per CPU
> variable which temporarily blocks the invocation of BPF programs from perf
> and kprobes.
> 
> Replace the preempt_disable/enable() pairs with migrate_disable/enable()
> pairs to prepare BPF to work on PREEMPT_RT enabled kernels. On a non-RT
> kernel this maps to preempt_disable/enable(), i.e. no functional change.

Will that _really_ work on RT ?

I'm puzzled about what will happen in the following scenario on RT:

Thread A is preempted within e.g. htab_elem_free_rcu, and Thread B is
scheduled and runs through a bunch of tracepoints. Both are on the
same CPU's runqueue:

CPU 1

Thread A is scheduled
(Thread A) htab_elem_free_rcu()
(Thread A)   migrate disable
(Thread A)   __this_cpu_inc(bpf_prog_active); -> per-cpu variable for
                                               deadlock prevention.
Thread A is preempted
Thread B is scheduled
(Thread B) Runs through various tracepoints:
           trace_call_bpf()
           if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
               -> will skip any instrumentation that happens to be on
                  this CPU until...
Thread B is preempted
Thread A is scheduled
(Thread A)  __this_cpu_dec(bpf_prog_active);
(Thread A)  migrate enable

Having all those events randomly and silently discarded might be quite
unexpected from a user standpoint. This turns the deadlock prevention
mechanism into a random tracepoint-dropping facility, which is
unsettling. One alternative approach we could consider to solve this
is to make this deadlock prevention nesting counter per-thread rather
than per-cpu.

Also, I don't think using __this_cpu_inc() without preempt-disable or
irq off is safe. You'll probably want to move to this_cpu_inc/dec
instead, which can be heavier on some architectures.

Thanks,

Mathieu


> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/bpf/hashtab.c |   12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -698,11 +698,11 @@ static void htab_elem_free_rcu(struct rc
>  	 * we're calling kfree, otherwise deadlock is possible if kprobes
>  	 * are placed somewhere inside of slub
>  	 */
> -	preempt_disable();
> +	migrate_disable();
>  	__this_cpu_inc(bpf_prog_active);
>  	htab_elem_free(htab, l);
>  	__this_cpu_dec(bpf_prog_active);
> -	preempt_enable();
> +	migrate_enable();
>  }
>  
>  static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
> @@ -1327,7 +1327,7 @@ static int
>  	}
>  
>  again:
> -	preempt_disable();
> +	migrate_disable();
>  	this_cpu_inc(bpf_prog_active);
>  	rcu_read_lock();
>  again_nocopy:
> @@ -1347,7 +1347,7 @@ static int
>  		raw_spin_unlock_irqrestore(&b->lock, flags);
>  		rcu_read_unlock();
>  		this_cpu_dec(bpf_prog_active);
> -		preempt_enable();
> +		migrate_enable();
>  		goto after_loop;
>  	}
>  
> @@ -1356,7 +1356,7 @@ static int
>  		raw_spin_unlock_irqrestore(&b->lock, flags);
>  		rcu_read_unlock();
>  		this_cpu_dec(bpf_prog_active);
> -		preempt_enable();
> +		migrate_enable();
>  		kvfree(keys);
>  		kvfree(values);
>  		goto alloc;
> @@ -1406,7 +1406,7 @@ static int
>  
>  	rcu_read_unlock();
>  	this_cpu_dec(bpf_prog_active);
> -	preempt_enable();
> +	migrate_enable();
>  	if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys,
>  	    key_size * bucket_cnt) ||
>  	    copy_to_user(uvalues + total * value_size, values,
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply index

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 13:39 [RFC patch 00/19] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 01/19] sched: Provide migrate_disable/enable() inlines Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 02/19] sched: Provide cant_migrate() Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 03/19] bpf: Update locking comment in hashtab code Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 04/19] bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run() Thomas Gleixner
2020-02-19 16:54   ` Steven Rostedt
2020-02-19 17:26     ` Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 05/19] perf/bpf: Remove preempt disable around BPF invocation Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 06/19] bpf: Dont iterate over possible CPUs with interrupts disabled Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 07/19] bpf: Provide BPF_PROG_RUN_PIN_ON_CPU() macro Thomas Gleixner
2020-02-14 18:50   ` Mathieu Desnoyers
2020-02-14 19:36     ` Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 08/19] bpf: Replace cant_sleep() with cant_migrate() Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 09/19] bpf: Use BPF_PROG_RUN_PIN_ON_CPU() at simple call sites Thomas Gleixner
2020-02-19  1:39   ` Vinicius Costa Gomes
2020-02-19  9:00     ` Thomas Gleixner
2020-02-19 16:38       ` Alexei Starovoitov
2020-02-14 13:39 ` [RFC patch 10/19] trace/bpf: Use migrate disable in trace_call_bpf() Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 11/19] bpf/tests: Use migrate disable instead of preempt disable Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 12/19] bpf: Use migrate_disable/enabe() in trampoline code Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 13/19] bpf: Use migrate_disable/enable in array macros and cgroup/lirc code Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 14/19] bpf: Use migrate_disable() in hashtab code Thomas Gleixner
2020-02-14 19:11   ` Mathieu Desnoyers [this message]
2020-02-14 19:56     ` Thomas Gleixner
2020-02-18 23:36       ` Alexei Starovoitov
2020-02-19  0:49         ` Thomas Gleixner
2020-02-19  1:23           ` Alexei Starovoitov
2020-02-19 15:17         ` Mathieu Desnoyers
2020-02-20  4:19           ` Alexei Starovoitov
2020-02-14 13:39 ` [RFC patch 15/19] bpf: Use migrate_disable() in sys_bpf() Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 16/19] bpf: Factor out hashtab bucket lock operations Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 17/19] bpf: Prepare hashtab locking for PREEMPT_RT Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 18/19] bpf, lpm: Make locking RT friendly Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 19/19] bpf/stackmap: Dont trylock mmap_sem with PREEMPT_RT and interrupts disabled Thomas Gleixner
2020-02-14 17:53 ` [RFC patch 00/19] bpf: Make BPF and PREEMPT_RT co-exist David Miller
2020-02-14 18:36   ` Thomas Gleixner
2020-02-17 12:59     ` [PATCH] bpf: Enforce map preallocation for all instrumentation programs Thomas Gleixner
2020-02-15 20:09 ` [RFC patch 00/19] bpf: Make BPF and PREEMPT_RT co-exist Jakub Kicinski

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200214191126.lbiusetaxecdl3of@localhost \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git