All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: sdf@google.com
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net,
	ast@kernel.org, daniel@iogearbox.net,
	David Laight <David.Laight@ACULAB.COM>
Subject: Re: [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
Date: Wed, 17 Jun 2020 10:59:29 -0700	[thread overview]
Message-ID: <20200617175929.7ee2alxyjfxuolw4@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20200617174508.GA246265@google.com>

On Wed, Jun 17, 2020 at 10:45:08AM -0700, sdf@google.com wrote:
> On 06/17, Alexei Starovoitov wrote:
> > On Tue, Jun 16, 2020 at 06:04:14PM -0700, Stanislav Fomichev wrote:
> > > Attaching to these hooks can break iptables because its optval is
> > > usually quite big, or at least bigger than the current PAGE_SIZE limit.
> > > David also mentioned some SCTP options can be big (around 256k).
> > >
> > > For such optvals we expose only the first PAGE_SIZE bytes to
> > > the BPF program. BPF program has two options:
> > > 1. Set ctx->optlen to 0 to indicate that the BPF's optval
> > >    should be ignored and the kernel should use original userspace
> > >    value.
> > > 2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.
> > >
> > > v5:
> > > * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
> > > * update the docs accordingly
> > >
> > > v4:
> > > * use temporary buffer to avoid optval == optval_end == NULL;
> > >   this removes the corner case in the verifier that might assume
> > >   non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.
> > >
> > > v3:
> > > * don't increase the limit, bypass the argument
> > >
> > > v2:
> > > * proper comments formatting (Jakub Kicinski)
> > >
> > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > > Cc: David Laight <David.Laight@ACULAB.COM>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++-----------------
> > >  1 file changed, 33 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > > index 4d76f16524cc..ac53102e244a 100644
> > > --- a/kernel/bpf/cgroup.c
> > > +++ b/kernel/bpf/cgroup.c
> > > @@ -1276,16 +1276,23 @@ static bool
> > __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
> > >
> > >  static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int
> > max_optlen)
> > >  {
> > > -	if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> > > +	if (unlikely(max_optlen < 0))
> > >  		return -EINVAL;
> > >
> > > +	if (unlikely(max_optlen > PAGE_SIZE)) {
> > > +		/* We don't expose optvals that are greater than PAGE_SIZE
> > > +		 * to the BPF program.
> > > +		 */
> > > +		max_optlen = PAGE_SIZE;
> > > +	}
> > > +
> > >  	ctx->optval = kzalloc(max_optlen, GFP_USER);
> > >  	if (!ctx->optval)
> > >  		return -ENOMEM;
> > >
> > >  	ctx->optval_end = ctx->optval + max_optlen;
> > >
> > > -	return 0;
> > > +	return max_optlen;
> > >  }
> > >
> > >  static void sockopt_free_buf(struct bpf_sockopt_kern *ctx)
> > > @@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct
> > sock *sk, int *level,
> > >  	 */
> > >  	max_optlen = max_t(int, 16, *optlen);
> > >
> > > -	ret = sockopt_alloc_buf(&ctx, max_optlen);
> > > -	if (ret)
> > > -		return ret;
> > > +	max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
> > > +	if (max_optlen < 0)
> > > +		return max_optlen;
> > >
> > >  	ctx.optlen = *optlen;
> > >
> > > -	if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
> > > +	if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) !=
> > 0) {
> > >  		ret = -EFAULT;
> > >  		goto out;
> > >  	}
> > > @@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct
> > sock *sk, int *level,
> > >  		/* export any potential modifications */
> > >  		*level = ctx.level;
> > >  		*optname = ctx.optname;
> > > -		*optlen = ctx.optlen;
> > > -		*kernel_optval = ctx.optval;
> > > +
> > > +		/* optlen == 0 from BPF indicates that we should
> > > +		 * use original userspace data.
> > > +		 */
> > > +		if (ctx.optlen != 0) {
> > > +			*optlen = ctx.optlen;
> 
> > I think it should be:
> > *optlen = min(ctx.optlen, max_optlen);
> We do have the following (existing) check above:
> 	} else if (ctx.optlen > max_optlen || ctx.optlen < -1) {
> 		/* optlen is out of bounds */
> 		ret = -EFAULT;
> 	} else {
> 
> So we shouldn't need any min here? Or am I missing something?

ahh. you're right.
Applied to bpf tree.

> > Otherwise when bpf prog doesn't adjust ctx.oplen the kernel will see
> > 4k only in kernel_optval whereas optlen will be > 4k.
> > I suspect iptables sockopt should have crashed at this point.
> > How did you test it?
> The selftests that I've attached in the series. The test is passing
> two pages and for IP_TOS we bypass the value via optlen=0 and
> for IP_FREEBIND we trim the buffer to 1 byte. I think this should
> cover this check here.
> 
> One thing I didn't really test is getsockopt when the kernel
> returns really large buffer (iptables). Right now, the test
> gets 4 bytes (trimmed) from the kernel. I think that's the only
> place that I didn't properly test. I wonder whether I should
> do a real iptables-like setsockopt/getsockopt :-/

would be nice :)

      reply	other threads:[~2020-06-17 17:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-17  1:04 [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev
2020-06-17  1:04 ` [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
2020-06-17  1:04 ` [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks Stanislav Fomichev
2020-06-17 17:09 ` [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
2020-06-17 17:45   ` sdf
2020-06-17 17:59     ` Alexei Starovoitov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200617175929.7ee2alxyjfxuolw4@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.