bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Stanislav Fomichev <sdf@fomichev.me>
Cc: Stanislav Fomichev <sdf@google.com>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>, Martin Lau <kafai@fb.com>
Subject: Re: [PATCH bpf-next v6 1/9] bpf: implement getsockopt and setsockopt hooks
Date: Tue, 18 Jun 2019 11:41:11 -0700	[thread overview]
Message-ID: <CAADnVQ+kymi+zJww+PfPd4WWhvNA67ynGVTd7oj6jiU+XFeguQ@mail.gmail.com> (raw)
In-Reply-To: <20190618180944.GJ9636@mini-arch>

On Tue, Jun 18, 2019 at 11:09 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 06/18, Stanislav Fomichev wrote:
> > On 06/18, Alexei Starovoitov wrote:
> > > On Mon, Jun 17, 2019 at 11:01:01AM -0700, Stanislav Fomichev wrote:
> > > > Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
> > > > BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.
> > > >
> > > > BPF_CGROUP_SETSOCKOPT get a read-only view of the setsockopt arguments.
> > > > BPF_CGROUP_GETSOCKOPT can modify the supplied buffer.
> > > > Both of them reuse existing PTR_TO_PACKET{,_END} infrastructure.
> > > >
> > > > The buffer memory is pre-allocated (because I don't think there is
> > > > a precedent for working with __user memory from bpf). This might be
> > > > slow to do for each {s,g}etsockopt call, that's why I've added
> > > > __cgroup_bpf_prog_array_is_empty that exits early if there is nothing
> > > > attached to a cgroup. Note, however, that there is a race between
> > > > __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
> > > > program layout might have changed; this should not be a problem
> > > > because in general there is a race between multiple calls to
> > > > {s,g}etsocktop and user adding/removing bpf progs from a cgroup.
> > > >
> > > > The return code of the BPF program is handled as follows:
> > > > * 0: EPERM
> > > > * 1: success, execute kernel {s,g}etsockopt path after BPF prog exits
> > > > * 2: success, do _not_ execute kernel {s,g}etsockopt path after BPF
> > > >      prog exits
> > > >
> > > > Note that if 0 or 2 is returned from BPF program, no further BPF program
> > > > in the cgroup hierarchy is executed. This is in contrast with any existing
> > > > per-cgroup BPF attach_type.
> > >
> > > This is drastically different from all other cgroup-bpf progs.
> > > I think all programs should be executed regardless of return code.
> > > It seems to me that 1 vs 2 difference can be expressed via bpf program logic
> > > instead of return code.
> > >
> > > How about we do what all other cgroup-bpf progs do:
> > > "any no is no. all yes is yes"
> > > Meaning any ret=0 - EPERM back to user.
> > > If all are ret=1 - kernel handles get/set.
> > >
> > > I think the desire to differentiate 1 vs 2 came from ordering issue
> > > on getsockopt.
> > > How about for setsockopt all progs run first and then kernel.
> > > For getsockopt kernel runs first and then all progs.
> > > Then progs will have an ability to overwrite anything the kernel returns.
> > Good idea, makes sense. For getsockopt we'd also need to pass the return
> > value of the kernel getsockopt to let bpf programs override it, but seems
> > doable. Let me play with it a bit; I'll send another version if nothing
> > major comes up.
> >
> > Thanks for another round of review!
> One clarification: we'd still probably need to have 3 return codes for
> setsockopt:
> * any 0 - EPERM
> * all 1 - continue with the kernel path (i.e. apply this sockopt as is)
> * any 2 - return after all BPF hooks are executed (bypass kernel)
>           (any 0 trumps any 2 -> EPERM)
>
> The context is readonly for setsockopt, so it shouldn't be an issue.
> Let me know if you have better idea how to handle that.

I think we don't really need 2.
The progs can reduce optlen to zero (or optname to BPF_EMPTY_SOCKOPT)
and do ret=1.
Then the kernel can see that nothing to be be done and return 0 to user space.
Since parent prog in the chain will be able to see that child prog
set optlen to zero, it will be able to overwrite if necessary.

getsockopt wil be clean as well.
all 1s return whatever was produced by progs to user space.
and progs will be able to see what kernel wanted to return because
the kernel's getsockopt logic ran first.
ret=2 doesn't have any meaning for getsockopt, so nice to keep
setsockopt symmetrical and don't do it there either.

  reply	other threads:[~2019-06-18 18:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17 18:01 [PATCH bpf-next v6 0/9] bpf: getsockopt and setsockopt hooks Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 1/9] bpf: implement " Stanislav Fomichev
2019-06-18 16:31   ` Alexei Starovoitov
2019-06-18 16:49     ` Stanislav Fomichev
2019-06-18 18:09       ` Stanislav Fomichev
2019-06-18 18:41         ` Alexei Starovoitov [this message]
2019-06-18 18:53           ` Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 2/9] bpf: sync bpf.h to tools/ Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 3/9] libbpf: support sockopt hooks Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 4/9] selftests/bpf: test sockopt section name Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 5/9] selftests/bpf: add sockopt test Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 6/9] selftests/bpf: add sockopt test that exercises sk helpers Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 7/9] selftests/bpf: add sockopt test that exercises BPF_F_ALLOW_MULTI Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 8/9] bpf: add sockopt documentation Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 9/9] bpftool: support cgroup sockopt Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAADnVQ+kymi+zJww+PfPd4WWhvNA67ynGVTd7oj6jiU+XFeguQ@mail.gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@fomichev.me \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).