All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>,
	davem@davemloft.net, daniel@iogearbox.net,
	jakub.kicinski@netronome.com, netdev@vger.kernel.org,
	kernel-team@fb.com, mingo@redhat.com, will.deacon@arm.com,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	jannh@google.com
Subject: Re: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock
Date: Mon, 28 Jan 2019 13:37:12 -0800	[thread overview]
Message-ID: <20190128213710.vjxnc2eq5rsisgfx@ast-mbp> (raw)
In-Reply-To: <20190128084310.GC28467@hirez.programming.kicks-ass.net>

On Mon, Jan 28, 2019 at 09:43:10AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 25, 2019 at 03:42:43PM -0800, Alexei Starovoitov wrote:
> > On Fri, Jan 25, 2019 at 10:10:57AM +0100, Peter Zijlstra wrote:
> 
> > > What about the progs that run from SoftIRQ ? Since that bpf_prog_active
> > > thing isn't inside BPF_PROG_RUN() what is to stop say:
> > > 
> > >    reuseport_select_sock()
> > >      ...
> > >        BPF_PROG_RUN()
> > >          bpf_spin_lock()
> > >         <IRQ>
> > > 	  ...
> > > 	  BPF_PROG_RUN()
> > > 	    bpf_spin_lock() // forever more
> > > 
> > > 	</IRQ>
> > > 
> > > Unless you stick that bpf_prog_active stuff inside BPF_PROG_RUN itself,
> > > I don't see how you can fundamentally avoid this happening (now or in
> > > the future).
> 
> > But your issue above is valid.
> 
> > We don't use bpf_prog_active for networking progs, since we allow
> > for one level of nesting due to the classic SKF_AD_PAY_OFFSET legacy.
> > Also we allow tracing progs to nest with networking progs.
> > People using this actively.
> > Typically it's not an issue, since in networking there is no
> > arbitrary nesting (unlike kprobe/nmi in tracing),
> > but for bpf_spin_lock it can be, since the same map can be shared
> > by networking and tracing progs and above deadlock would be possible:
> > (first BPF_PROG_RUN will be from networking prog, then kprobe+bpf's
> > BPF_PROG_RUN accessing the same map with bpf_spin_lock)
> > 
> > So for now I'm going to allow bpf_spin_lock in networking progs only,
> > since there is no arbitrary nesting there.
> 
> Isn't that still broken? AFAIU networking progs can happen in task
> context (TX) and SoftIRQ context (RX), which can nest.

Sure. sendmsg side of networking can be interrupted by napi receive.
Both can have bpf progs attached at different points, but napi won't run
when bpf prog is running, because bpf prog disables preemption.
More so the whole networking stack can be recursive and there is
xmit_recursion counter to check for bad cases.
When bpf progs interact with networking they don't add to that recursion.
All of *redirect*() helpers do so outside of bpf preempt disabled context.
Also there is no nesting of the same networking prog type.
Like xdp/tc/lwt/cgroup bpf progs cannot be called recursively by design.
There are no arbitrary entry points unlike kprobe/tracepoint.
The only nesting is when socket filter _classic_ bpf prog is calling
SKF_AD_PAY_OFFSET legacy. That calls flow dissector which may call flow dissector
bpf prog. Classic bpf doesn't use bpf maps, so no deadlock issues.

> > And once we figure out the safety concerns for kprobe/tracepoint progs
> > we can enable bpf_spin_lock there too.
> > NMI bpf progs will never have bpf_spin_lock.
> 
> kprobe is like NMI, since it pokes an INT3 instruction which can trigger
> in the middle of IRQ-disabled or even in NMIs. Similar arguments can be
> made for tracepoints, they can happen 'anywhere'.

exactly. that's why there is bpf_prog_active to protect the kernel in general
for tracing bpf progs.


  reply	other threads:[~2019-01-28 21:37 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-24  4:13 [PATCH v4 bpf-next 0/9] introduce bpf_spin_lock Alexei Starovoitov
2019-01-24  4:13 ` [PATCH v4 bpf-next 1/9] bpf: " Alexei Starovoitov
2019-01-24 18:01   ` Peter Zijlstra
2019-01-24 18:56     ` Peter Zijlstra
2019-01-24 23:42       ` Paul E. McKenney
2019-01-25  0:05         ` Alexei Starovoitov
2019-01-25  1:22           ` Paul E. McKenney
2019-01-25  1:46             ` Jann Horn
2019-01-25  2:38               ` Alexei Starovoitov
2019-01-25  4:27                 ` Alexei Starovoitov
2019-01-25  4:31                   ` Paul E. McKenney
2019-01-25  4:47                     ` Alexei Starovoitov
2019-01-25 16:02                       ` Paul E. McKenney
2019-01-25  4:11               ` Paul E. McKenney
2019-01-25 16:18                 ` Jann Horn
2019-01-25 22:51                   ` Paul E. McKenney
2019-01-25 23:44                     ` Alexei Starovoitov
2019-01-26  0:43                       ` Jann Horn
2019-01-26  0:59                         ` Jann Horn
2019-01-24 23:58     ` Alexei Starovoitov
2019-01-25  0:18       ` Jann Horn
2019-01-25  2:49         ` Alexei Starovoitov
2019-01-25  2:29       ` Eric Dumazet
2019-01-25  2:34         ` Alexei Starovoitov
2019-01-25  2:44           ` Eric Dumazet
2019-01-25  2:57             ` Alexei Starovoitov
2019-01-25  8:38               ` Peter Zijlstra
2019-01-25  9:10       ` Peter Zijlstra
2019-01-25 23:42         ` Alexei Starovoitov
2019-01-28  8:24           ` Peter Zijlstra
2019-01-28  8:31           ` Peter Zijlstra
2019-01-28  8:35             ` Peter Zijlstra
2019-01-28 20:49               ` Alexei Starovoitov
2019-01-28  8:43           ` Peter Zijlstra
2019-01-28 21:37             ` Alexei Starovoitov [this message]
2019-01-29  8:59               ` Peter Zijlstra
2019-01-30  2:20                 ` Alexei Starovoitov
2019-01-25  9:59       ` Peter Zijlstra
2019-01-25 10:09       ` Peter Zijlstra
2019-01-25 10:23       ` Peter Zijlstra
2019-01-26  0:17         ` bpf memory model. Was: " Alexei Starovoitov
2019-01-28  9:24           ` Peter Zijlstra
2019-01-28 21:56             ` Alexei Starovoitov
2019-01-29  9:16               ` Peter Zijlstra
2019-01-30  2:32                 ` Alexei Starovoitov
2019-01-30  8:58                   ` Peter Zijlstra
2019-01-30 19:36                     ` Alexei Starovoitov
2019-01-30 18:11               ` Will Deacon
2019-01-30 18:36                 ` Paul E. McKenney
2019-01-30 19:51                   ` Alexei Starovoitov
2019-01-30 21:05                     ` Paul E. McKenney
2019-01-30 22:57                       ` Alexei Starovoitov
2019-01-31 14:01                         ` Paul E. McKenney
2019-01-31 18:47                           ` Alexei Starovoitov
2019-02-01 14:05                             ` Paul E. McKenney
2019-01-30 19:50                 ` Alexei Starovoitov
2019-01-24  4:13 ` [PATCH v4 bpf-next 2/9] bpf: add support for bpf_spin_lock to cgroup local storage Alexei Starovoitov
2019-01-24  4:13 ` [PATCH v4 bpf-next 3/9] tools/bpf: sync include/uapi/linux/bpf.h Alexei Starovoitov
2019-01-24  4:13 ` [PATCH v4 bpf-next 4/9] selftests/bpf: add bpf_spin_lock tests Alexei Starovoitov
2019-01-24  4:13 ` [PATCH v4 bpf-next 5/9] selftests/bpf: add bpf_spin_lock C test Alexei Starovoitov
2019-01-24  4:14 ` [PATCH v4 bpf-next 6/9] bpf: introduce BPF_F_LOCK flag Alexei Starovoitov
2019-01-24  4:14 ` [PATCH v4 bpf-next 7/9] tools/bpf: sync uapi/bpf.h Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190128213710.vjxnc2eq5rsisgfx@ast-mbp \
    --to=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=jakub.kicinski@netronome.com \
    --cc=jannh@google.com \
    --cc=kernel-team@fb.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.