netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chema Gonzalez <chema@google.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: Daniel Borkmann <dborkman@redhat.com>,
	Ingo Molnar <mingo@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Jiri Olsa <jolsa@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6 net-next 1/4] net: flow_dissector: avoid multiple calls in eBPF
Date: Tue, 3 Jun 2014 14:12:19 -0700	[thread overview]
Message-ID: <CA+ZOOTPdDDeGxZVGXVRZmjkxqkGefgDLrawo_9YXbjwTQ_FLzA@mail.gmail.com> (raw)
In-Reply-To: <CAMEtUuxK5hV_eORRLaSHSiwF5X82Ae91vL6W0-6au48v8H6bAA@mail.gmail.com>

On Tue, Jun 3, 2014 at 1:15 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Tue, Jun 3, 2014 at 1:33 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> On 06/02/2014 06:48 PM, Alexei Starovoitov wrote:
>>>
>>> imo there are pros and cons in Daniel's and Chema's proposals
>>> for classic BPF extensions.
>>> I like Chema's a bit more, since his proposal doesn't require to
>>> change classic BPF verifier and that's always a delicate area
>>> (seccomp also allows to use M[] slots).
>>
>>
>> What bothers me is that you have to *special case* this extension
>> all over the BPF converter and stack, even you need to introduce a
>> prologue just to walk the whole filter program and to check if the
>> extension is present; next to that instead of one extension you
>> need to add a couple of them to uapi. I understand Chema's proposal
>
> Agree. The walking part and multiple anc loads are not clean, but in your
> approach you'd need to hack sk_chk_filter() to recognize very specific
> anc load and do even fancier things in check_load_and_stores()
> to make sure that 'ld #5, ld #keys' is safe from M[] usage point of view.
I think you may be confusing the two parts of the patch, namely
whether we implement the code as 1 anc load with N parameters or as N
anc loads, and the existence of a prologue.

- Adding just 1 anc load ("ld #keys" in your pseudo-code) requires
less UAPI changes, but effectively exports the flow_keys struct to the
user (which means we cannot change it in the future). The fact that
you're proposing your own version of the flow_keys ( {nhoff, nproto,
thoff, tproto, poff} ) does not make it less of a problem: You're just
exporting an alternative (albeit IMO way better one, as all the
dissected info can be obtained from your 5-tuple) flow_keys. From a
usability PoV, the user may have to either do "ld #0; ld #keys"
(knowing that "#0" refers to "nhoff"), or "ld #nhoff". I definitely
prefer the second, but I can easily rewrite the patch to use the
first.

- Now, the existence of a prologue is a must if you want to ensure the
filter only calls the actual flow dissector once (this is the main
goal of the patch, actually). Having 2 insns separated by N insns that
access data obtained from the same flow dissector call means you have
to both (a) ensure the first caller -- whether it's the first or the
second insn -- does perform the BPF_CALL, and (b) ensure that none of
the N insns in the middle mucks with the result of the first call (my
solution deals with (b) by using the stack outside of M[], while yours
requires verifying the code. I find mine easier).

In order to achieve (a), you need the prologue code: Because the code
can have jumps, you can never know whether the "ld #keys" was actually
called or not. What my solution's prologue does is to write a 0 on a
flow_inited u32, which states that the flow dissector hasn't been
called. Now, every call for any of the sub-fields checks this
flow_inited field, and runs the flow dissector iff it's zero (hasn't
been called yet), in which case it also sets flow_inited to 1.

Your approach needs it too. Citing from your pseudo-code:

> ld #5     <-- indicates to fill the first 5 slots of M[], so M[0] to M[4]
> ld #keys  <-- triggers the extension to fill the M[] slots
> ld M[0]   <-- loads nhoff from M[0] into accu

How does the "ld M[0]" know that the actual flow dissector has already
been called? What if the insn just before the "ld #5" was "jmp +2" ?
In that case, the "ld #keys" would have never been called.

> ld M[0]   <-- loads nhoff from M[0] into accu
> <do sth with it>
> ld M[3]   <-- loads tproto into accu, etc

>> or his need to easily access these data but that's why I proposed
>> the above if he wants to go for that. Btw, seccomp doesn't allow
>> for loading BPF extensions, you said so yourself Alexei.
>
> of course, but seccomp does use ld/st M[] which will overlap
> with your socket extension and since check_load_and_stores()
> will be hacked, seccomp verification needs to be considered as well.
> From user api point of view, your approach is cleaner than Chema's,
> but from implementation Chema's is much safer and smaller.
>
> Anyway as I said before I'm not excited about either.
> I don't think we should be adding classic BPF extensions any more.
> The long term headache of supporting classic BPF extensions
> outweighs the short term benefits.
I see a couple of issues with (effectively) freezing classic BPF
development while waiting for direct eBPF access to happen. The first
one is that the kernel has to accept it. I can see many questions
about this, especially security and usability (I'll send an email
about the "split BPF out of core later"). Now, the main issue is
whether/when the tools will support it. IMO, this is useful iff I can
quickly write/reuse filters and run tcpdump filters based on them. I'm
trying to get upstream libpcap to accept support for raw (classic) BPF
filters, and it's taking a long time. I can imagine how they may be
less receptive about supporting a Linux-only eBPF mechanism. Tools do
matter.

Even if eBPF happens, it's not that the extensions are so hard to port
to eBPF: It's one BPF_CALL per extension. And they have a
straightforward porting path to the C-like code.

-Chema


> Not having to constantly tweak kernel for such cases was the key
> goal of eBPF. My two eBPF approaches to solve Chema's need
> are both cleaner, since nothing special is being done in eBPF
> core to support this new need. Both instruction set and verifier
> stay generic and cover tracing and this new socket use at once.
> I do realize that I'm talking here about future eBPF verifier that
> is not in tree yet and eBPF is not exposed to user space, but
> I think we should consider longer term perspective.
> If I'm forced to pick between yours or Chema's classic extensions,
> I would pick Chema's because it's lesser evil :)
>
>>> I think exposing eBPF to user space is not a matter of 'if' but 'when'.
>>>
>>> I'm not sure how pressing it is now to add another extension to classic,
>>> when the goal of this patch set fits into eBPF model just fine.
>>> yes, eBPF verifier is not in tree yet and it will obviously take longer to
>>> review than Chema's or Daniel's set.
>>>
>>> When eBPF is exposed to user space the inner header access can
>>> be done in two ways without changing eBPF instruction set or eBPF
>>> verifier.
>>>
>>> eBPF approach #1:
>>> -- code packet parser in restricted C
>>> Pros:
>>> skb_flow_dissect() stays hidden in kernel.
>>> No additional uapi headache which exist if we start reusing
>>> in-kernel skb_flow_dissect() either via classic or eBPF.
>>> Such eBPF program will be just as fast as in-kernel skb_flow_dissect()
>>> (I provided performance numbers before)
>>> Cons:
>>> in-kernel skb_flow_dissect() is not reused, but mostly copy-pasted to
>>> userspace.
>>>
>>> eBPF approach #2:
>>> -- reuse in-kernel skb_flow_dissect() via bpf_call insn from eBPF program
>>> Pros:
>>> eBPF program becomes much shorter and can be done in asm like:
>>> bpf_mov r2, fp
>>> bpf_sub r2, 64
>>> bpf_call bpf_skb_flow_dissect  // call in-kernel helper function from
>>> eBPF program
>>> bpf_ldx r1, [fp - 64]  // access flow_keys data
>>> bpf_ldx r2, [fp - 60]
>>>
>>> Cons:
>>> bpf_skb_flow_dissect() (wrapper on top of skb_flow_dissect()) becomes
>>> uapi visible helper function
>>>
>>> imo both eBPF approaches are cleaner than extending classic.
>>>
>>

  reply	other threads:[~2014-06-03 21:12 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-30 18:29 [PATCH] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-04-30 22:21 ` Alexei Starovoitov
2014-05-01 10:53 ` Daniel Borkmann
2014-05-01 10:55   ` Daniel Borkmann
2014-05-01 18:44     ` Chema Gonzalez
2014-05-01 18:44 ` [PATCH net-next v2] " Chema Gonzalez
2014-05-02 16:21   ` Ben Hutchings
2014-05-02 21:49   ` David Miller
2014-05-03  0:53     ` Chema Gonzalez
2014-05-03  2:52       ` David Miller
2014-05-05 18:42         ` Chema Gonzalez
2014-05-05 19:12           ` David Miller
2014-05-14 18:42             ` Chema Gonzalez
2014-05-14 18:51               ` Chema Gonzalez
2014-05-14 18:42 ` [PATCH v2 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF Chema Gonzalez
2014-05-14 18:42   ` [PATCH v2 net-next 2/3] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-14 18:42   ` [PATCH v2 net-next 3/3] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-14 18:51 ` [PATCH v3 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF Chema Gonzalez
2014-05-14 20:05   ` Alexei Starovoitov
2014-05-14 21:51     ` Chema Gonzalez
2014-05-14 22:44       ` Alexei Starovoitov
2014-05-16 18:41         ` Chema Gonzalez
2014-05-14 18:51 ` [PATCH v3 net-next 2/3] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-14 18:51 ` [PATCH v3 net-next 3/3] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-16 18:41 ` [PATCH v5 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF Chema Gonzalez
2014-05-16 22:00   ` Alexei Starovoitov
2014-05-19 22:23     ` Chema Gonzalez
2014-05-20  9:58       ` Daniel Borkmann
2014-05-16 18:41 ` [PATCH v5 net-next 2/3] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-16 18:41 ` [PATCH v5 net-next 3/3] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-29 18:55 ` [PATCH v6 net-next 1/4] net: flow_dissector: avoid multiple calls in eBPF Chema Gonzalez
2014-05-29 23:54   ` Daniel Borkmann
2014-05-30 17:12     ` Chema Gonzalez
2014-06-02 12:36       ` Daniel Borkmann
2014-06-02 16:48         ` Alexei Starovoitov
2014-06-03  8:33           ` Daniel Borkmann
2014-06-03 20:15             ` Alexei Starovoitov
2014-06-03 21:12               ` Chema Gonzalez [this message]
2014-06-04  8:51                 ` Daniel Borkmann
2014-06-05  6:55                   ` David Miller
2014-06-20 21:56                   ` Chema Gonzalez
2014-06-24  8:14                     ` Daniel Borkmann
2014-06-25 22:00                       ` Chema Gonzalez
2014-06-27 10:19                         ` Daniel Borkmann
2014-05-29 18:56 ` [PATCH v6 net-next 2/4] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-29 18:56 ` [PATCH v6 net-next 3/4] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-29 18:56 ` [PATCH v6 net-next 4/4] net: filter: minor BPF cleanups Chema Gonzalez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+ZOOTPdDDeGxZVGXVRZmjkxqkGefgDLrawo_9YXbjwTQ_FLzA@mail.gmail.com \
    --to=chema@google.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=ast@plumgrid.com \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=edumazet@google.com \
    --cc=hpa@zytor.com \
    --cc=jolsa@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).