bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@fomichev.me>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Stanislav Fomichev <sdf@google.com>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>, Martin Lau <kafai@fb.com>
Subject: Re: [PATCH bpf-next v6 1/9] bpf: implement getsockopt and setsockopt hooks
Date: Tue, 18 Jun 2019 11:53:52 -0700	[thread overview]
Message-ID: <20190618185352.GK9636@mini-arch> (raw)
In-Reply-To: <CAADnVQ+kymi+zJww+PfPd4WWhvNA67ynGVTd7oj6jiU+XFeguQ@mail.gmail.com>

On 06/18, Alexei Starovoitov wrote:
> On Tue, Jun 18, 2019 at 11:09 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 06/18, Stanislav Fomichev wrote:
> > > On 06/18, Alexei Starovoitov wrote:
> > > > On Mon, Jun 17, 2019 at 11:01:01AM -0700, Stanislav Fomichev wrote:
> > > > > Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
> > > > > BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.
> > > > >
> > > > > BPF_CGROUP_SETSOCKOPT get a read-only view of the setsockopt arguments.
> > > > > BPF_CGROUP_GETSOCKOPT can modify the supplied buffer.
> > > > > Both of them reuse existing PTR_TO_PACKET{,_END} infrastructure.
> > > > >
> > > > > The buffer memory is pre-allocated (because I don't think there is
> > > > > a precedent for working with __user memory from bpf). This might be
> > > > > slow to do for each {s,g}etsockopt call, that's why I've added
> > > > > __cgroup_bpf_prog_array_is_empty that exits early if there is nothing
> > > > > attached to a cgroup. Note, however, that there is a race between
> > > > > __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
> > > > > program layout might have changed; this should not be a problem
> > > > > because in general there is a race between multiple calls to
> > > > > {s,g}etsocktop and user adding/removing bpf progs from a cgroup.
> > > > >
> > > > > The return code of the BPF program is handled as follows:
> > > > > * 0: EPERM
> > > > > * 1: success, execute kernel {s,g}etsockopt path after BPF prog exits
> > > > > * 2: success, do _not_ execute kernel {s,g}etsockopt path after BPF
> > > > >      prog exits
> > > > >
> > > > > Note that if 0 or 2 is returned from BPF program, no further BPF program
> > > > > in the cgroup hierarchy is executed. This is in contrast with any existing
> > > > > per-cgroup BPF attach_type.
> > > >
> > > > This is drastically different from all other cgroup-bpf progs.
> > > > I think all programs should be executed regardless of return code.
> > > > It seems to me that 1 vs 2 difference can be expressed via bpf program logic
> > > > instead of return code.
> > > >
> > > > How about we do what all other cgroup-bpf progs do:
> > > > "any no is no. all yes is yes"
> > > > Meaning any ret=0 - EPERM back to user.
> > > > If all are ret=1 - kernel handles get/set.
> > > >
> > > > I think the desire to differentiate 1 vs 2 came from ordering issue
> > > > on getsockopt.
> > > > How about for setsockopt all progs run first and then kernel.
> > > > For getsockopt kernel runs first and then all progs.
> > > > Then progs will have an ability to overwrite anything the kernel returns.
> > > Good idea, makes sense. For getsockopt we'd also need to pass the return
> > > value of the kernel getsockopt to let bpf programs override it, but seems
> > > doable. Let me play with it a bit; I'll send another version if nothing
> > > major comes up.
> > >
> > > Thanks for another round of review!
> > One clarification: we'd still probably need to have 3 return codes for
> > setsockopt:
> > * any 0 - EPERM
> > * all 1 - continue with the kernel path (i.e. apply this sockopt as is)
> > * any 2 - return after all BPF hooks are executed (bypass kernel)
> >           (any 0 trumps any 2 -> EPERM)
> >
> > The context is readonly for setsockopt, so it shouldn't be an issue.
> > Let me know if you have better idea how to handle that.
> 
> I think we don't really need 2.
> The progs can reduce optlen to zero (or optname to BPF_EMPTY_SOCKOPT)
> and do ret=1.
> Then the kernel can see that nothing to be be done and return 0 to user space.
> Since parent prog in the chain will be able to see that child prog
> set optlen to zero, it will be able to overwrite if necessary.
Ack, optlen=0 sounds good. In that case parent prog can poke into optval
because optval_end still points to the valid end of the data (and, as you
said, can override optlen back if necessary). Thanks!

> getsockopt wil be clean as well.
> all 1s return whatever was produced by progs to user space.
> and progs will be able to see what kernel wanted to return because
> the kernel's getsockopt logic ran first.
> ret=2 doesn't have any meaning for getsockopt, so nice to keep
> setsockopt symmetrical and don't do it there either.
Agreed.

  reply	other threads:[~2019-06-18 18:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17 18:01 [PATCH bpf-next v6 0/9] bpf: getsockopt and setsockopt hooks Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 1/9] bpf: implement " Stanislav Fomichev
2019-06-18 16:31   ` Alexei Starovoitov
2019-06-18 16:49     ` Stanislav Fomichev
2019-06-18 18:09       ` Stanislav Fomichev
2019-06-18 18:41         ` Alexei Starovoitov
2019-06-18 18:53           ` Stanislav Fomichev [this message]
2019-06-17 18:01 ` [PATCH bpf-next v6 2/9] bpf: sync bpf.h to tools/ Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 3/9] libbpf: support sockopt hooks Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 4/9] selftests/bpf: test sockopt section name Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 5/9] selftests/bpf: add sockopt test Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 6/9] selftests/bpf: add sockopt test that exercises sk helpers Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 7/9] selftests/bpf: add sockopt test that exercises BPF_F_ALLOW_MULTI Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 8/9] bpf: add sockopt documentation Stanislav Fomichev
2019-06-17 18:01 ` [PATCH bpf-next v6 9/9] bpftool: support cgroup sockopt Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190618185352.GK9636@mini-arch \
    --to=sdf@fomichev.me \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).