bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
@ 2020-06-08 18:27 Stanislav Fomichev
  2020-06-08 18:27 ` [PATCH bpf v3 2/2] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
  2020-06-13  0:33 ` [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
  0 siblings, 2 replies; 8+ messages in thread
From: Stanislav Fomichev @ 2020-06-08 18:27 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, David Laight

Attaching to these hooks can break iptables because its optval is
usually quite big, or at least bigger than the current PAGE_SIZE limit.
David also mentioned some SCTP options can be big (around 256k).

There are two possible ways to fix it:
1. Increase the limit to match iptables max optval. There is, however,
   no clear upper limit. Technically, iptables can accept up to
   512M of data (not sure how practical it is though).

2. Bypass the value (don't expose to BPF) if it's too big and trigger
   BPF only with level/optname so BPF can still decide whether
   to allow/deny big sockopts.

The initial attempt was implemented using strategy #1. Due to
listed shortcomings, let's switch to strategy #2. When there is
legitimate a real use-case for iptables/SCTP, we can consider increasing
the PAGE_SIZE limit.

v3:
* don't increase the limit, bypass the argument

v2:
* proper comments formatting (Jakub Kicinski)

Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Cc: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 kernel/bpf/cgroup.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index fdf7836750a3..758082853086 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1276,9 +1276,18 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
 
 static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
 {
-	if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
+	if (unlikely(max_optlen < 0))
 		return -EINVAL;
 
+	if (unlikely(max_optlen > PAGE_SIZE)) {
+		/* We don't expose optvals that are greater than PAGE_SIZE
+		 * to the BPF program.
+		 */
+		ctx->optval = NULL;
+		ctx->optval_end = NULL;
+		return 0;
+	}
+
 	ctx->optval = kzalloc(max_optlen, GFP_USER);
 	if (!ctx->optval)
 		return -ENOMEM;
@@ -1325,7 +1334,7 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 
 	ctx.optlen = *optlen;
 
-	if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
+	if (ctx.optval && copy_from_user(ctx.optval, optval, *optlen) != 0) {
 		ret = -EFAULT;
 		goto out;
 	}
@@ -1407,7 +1416,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 		if (ctx.optlen > max_optlen)
 			ctx.optlen = max_optlen;
 
-		if (copy_from_user(ctx.optval, optval, ctx.optlen) != 0) {
+		if (ctx.optval &&
+		    copy_from_user(ctx.optval, optval, ctx.optlen) != 0) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -1436,7 +1446,7 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 		goto out;
 	}
 
-	if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
+	if ((ctx.optval && copy_to_user(optval, ctx.optval, ctx.optlen)) ||
 	    put_user(ctx.optlen, optlen)) {
 		ret = -EFAULT;
 		goto out;
-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf v3 2/2] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed
  2020-06-08 18:27 [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev
@ 2020-06-08 18:27 ` Stanislav Fomichev
  2020-06-13  0:33 ` [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
  1 sibling, 0 replies; 8+ messages in thread
From: Stanislav Fomichev @ 2020-06-08 18:27 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev

We are relying on the fact, that we can pass > sizeof(int) optvals
to the SOL_IP+IP_FREEBIND option (the kernel will take first 4 bytes).
In the BPF program, we return EPERM if optval is greater than optval_end
(implemented via PTR_TO_PACKET/PTR_TO_PACKET_END) and rely on the verifier
to enforce the fact that this data can not be touched.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/prog_tests/sockopt_sk.c     | 26 +++++++++++++++++++
 .../testing/selftests/bpf/progs/sockopt_sk.c  | 20 ++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
index 2061a6beac0f..eae1c8a1fee0 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
@@ -13,6 +13,7 @@ static int getsetsockopt(void)
 		char cc[16]; /* TCP_CA_NAME_MAX */
 	} buf = {};
 	socklen_t optlen;
+	char *big_buf;
 
 	fd = socket(AF_INET, SOCK_STREAM, 0);
 	if (fd < 0) {
@@ -78,6 +79,31 @@ static int getsetsockopt(void)
 		goto err;
 	}
 
+	/* IP_FREEBIND - BPF can't access optval when optlen > PAGE_SIZE */
+
+	optlen = getpagesize() * 2;
+	big_buf = calloc(1, optlen);
+	if (!big_buf) {
+		log_err("Couldn't allocate two pages");
+		goto err;
+	}
+
+	err = setsockopt(fd, SOL_IP, IP_FREEBIND, big_buf, optlen);
+	if (err != 0) {
+		log_err("Failed to call setsockopt, ret=%d", err);
+		free(big_buf);
+		goto err;
+	}
+
+	err = getsockopt(fd, SOL_IP, IP_FREEBIND, big_buf, &optlen);
+	if (err != 0) {
+		log_err("Failed to call getsockopt, ret=%d", err);
+		free(big_buf);
+		goto err;
+	}
+
+	free(big_buf);
+
 	/* SO_SNDBUF is overwritten */
 
 	buf.u32 = 0x01010101;
diff --git a/tools/testing/selftests/bpf/progs/sockopt_sk.c b/tools/testing/selftests/bpf/progs/sockopt_sk.c
index d5a5eeb5fb52..933a2ef9c930 100644
--- a/tools/testing/selftests/bpf/progs/sockopt_sk.c
+++ b/tools/testing/selftests/bpf/progs/sockopt_sk.c
@@ -51,6 +51,16 @@ int _getsockopt(struct bpf_sockopt *ctx)
 		return 1;
 	}
 
+	if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+		if (optval > optval_end) {
+			/* For optval > PAGE_SIZE, the actual data
+			 * is not provided.
+			 */
+			return 0; /* EPERM, unexpected data size */
+		}
+		return 1;
+	}
+
 	if (ctx->level != SOL_CUSTOM)
 		return 0; /* EPERM, deny everything except custom level */
 
@@ -112,6 +122,16 @@ int _setsockopt(struct bpf_sockopt *ctx)
 		return 1;
 	}
 
+	if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+		if (optval > optval_end) {
+			/* For optval > PAGE_SIZE, the actual data
+			 * is not provided.
+			 */
+			return 0; /* EPERM, unexpected data size */
+		}
+		return 1;
+	}
+
 	if (ctx->level != SOL_CUSTOM)
 		return 0; /* EPERM, deny everything except custom level */
 
-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
  2020-06-08 18:27 [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev
  2020-06-08 18:27 ` [PATCH bpf v3 2/2] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
@ 2020-06-13  0:33 ` Alexei Starovoitov
  2020-06-13  1:03   ` Stanislav Fomichev
  1 sibling, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2020-06-13  0:33 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, davem, ast, daniel, David Laight

On Mon, Jun 08, 2020 at 11:27:47AM -0700, Stanislav Fomichev wrote:
> Attaching to these hooks can break iptables because its optval is
> usually quite big, or at least bigger than the current PAGE_SIZE limit.
> David also mentioned some SCTP options can be big (around 256k).
> 
> There are two possible ways to fix it:
> 1. Increase the limit to match iptables max optval. There is, however,
>    no clear upper limit. Technically, iptables can accept up to
>    512M of data (not sure how practical it is though).
> 
> 2. Bypass the value (don't expose to BPF) if it's too big and trigger
>    BPF only with level/optname so BPF can still decide whether
>    to allow/deny big sockopts.
> 
> The initial attempt was implemented using strategy #1. Due to
> listed shortcomings, let's switch to strategy #2. When there is
> legitimate a real use-case for iptables/SCTP, we can consider increasing
> the PAGE_SIZE limit.
> 
> v3:
> * don't increase the limit, bypass the argument
> 
> v2:
> * proper comments formatting (Jakub Kicinski)
> 
> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> Cc: David Laight <David.Laight@ACULAB.COM>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  kernel/bpf/cgroup.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index fdf7836750a3..758082853086 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1276,9 +1276,18 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
>  
>  static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
>  {
> -	if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> +	if (unlikely(max_optlen < 0))
>  		return -EINVAL;
>  
> +	if (unlikely(max_optlen > PAGE_SIZE)) {
> +		/* We don't expose optvals that are greater than PAGE_SIZE
> +		 * to the BPF program.
> +		 */
> +		ctx->optval = NULL;
> +		ctx->optval_end = NULL;
> +		return 0;
> +	}

It's probably ok, but makes me uneasy about verifier consequences.
ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
I don't think we have such tests. I guess bpf prog won't be able to read
anything and nothing will crash, but having PTR_TO_PACKET that is
actually NULL would be an odd special case to keep in mind for everyone
who will work on the verifier from now on.

Also consider bpf prog that simply reads something small like 4 bytes.
IP_FREEBIND sockopt (like your selftest in the patch 2) will have
those 4 bytes, so it's natural for the prog to assume that it can read it.
It will have
p = ctx->optval;
if (p + 4 > ctx->optval_end)
 /* goto out and don't bother logging, since that never happens */
*(u32*)p;

but 'clever' user space would pass long optlen and prog suddenly
'not seeing' the sockopt. It didn't crash, but debugging would be
surprising.

I feel it's better to copy the first 4k and let the program see it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
  2020-06-13  0:33 ` [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
@ 2020-06-13  1:03   ` Stanislav Fomichev
  2020-06-13  3:50     ` Alexei Starovoitov
  0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Fomichev @ 2020-06-13  1:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Netdev, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
	David Laight

On Fri, Jun 12, 2020 at 5:34 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jun 08, 2020 at 11:27:47AM -0700, Stanislav Fomichev wrote:
> > Attaching to these hooks can break iptables because its optval is
> > usually quite big, or at least bigger than the current PAGE_SIZE limit.
> > David also mentioned some SCTP options can be big (around 256k).
> >
> > There are two possible ways to fix it:
> > 1. Increase the limit to match iptables max optval. There is, however,
> >    no clear upper limit. Technically, iptables can accept up to
> >    512M of data (not sure how practical it is though).
> >
> > 2. Bypass the value (don't expose to BPF) if it's too big and trigger
> >    BPF only with level/optname so BPF can still decide whether
> >    to allow/deny big sockopts.
> >
> > The initial attempt was implemented using strategy #1. Due to
> > listed shortcomings, let's switch to strategy #2. When there is
> > legitimate a real use-case for iptables/SCTP, we can consider increasing
> > the PAGE_SIZE limit.
> >
> > v3:
> > * don't increase the limit, bypass the argument
> >
> > v2:
> > * proper comments formatting (Jakub Kicinski)
> >
> > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > Cc: David Laight <David.Laight@ACULAB.COM>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  kernel/bpf/cgroup.c | 18 ++++++++++++++----
> >  1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index fdf7836750a3..758082853086 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -1276,9 +1276,18 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
> >
> >  static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
> >  {
> > -     if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> > +     if (unlikely(max_optlen < 0))
> >               return -EINVAL;
> >
> > +     if (unlikely(max_optlen > PAGE_SIZE)) {
> > +             /* We don't expose optvals that are greater than PAGE_SIZE
> > +              * to the BPF program.
> > +              */
> > +             ctx->optval = NULL;
> > +             ctx->optval_end = NULL;
> > +             return 0;
> > +     }
>
> It's probably ok, but makes me uneasy about verifier consequences.
> ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
> Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
> I don't think we have such tests. I guess bpf prog won't be able to read
> anything and nothing will crash, but having PTR_TO_PACKET that is
> actually NULL would be an odd special case to keep in mind for everyone
> who will work on the verifier from now on.
>
> Also consider bpf prog that simply reads something small like 4 bytes.
> IP_FREEBIND sockopt (like your selftest in the patch 2) will have
> those 4 bytes, so it's natural for the prog to assume that it can read it.
> It will have
> p = ctx->optval;
> if (p + 4 > ctx->optval_end)
>  /* goto out and don't bother logging, since that never happens */
> *(u32*)p;
>
> but 'clever' user space would pass long optlen and prog suddenly
> 'not seeing' the sockopt. It didn't crash, but debugging would be
> surprising.
>
> I feel it's better to copy the first 4k and let the program see it.
Agreed with the IP_FREEBIND example wrt observability, however it's
not clear what to do with the cropped buffer if the bpf program
modifies it.

Consider that huge iptables setsockopts where the usespace passes
PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it.
Now, if the bpf program modifies the buffer (say, flips some byte), we
are back to square one. We either have to silently discard that buffer
or reallocate/merge. My reasoning with data == NULL, is that at least
there is a clear signal that the program can't access the data (and
can look at optlen to see if the original buffer is indeed non-zero
and maybe deny such requests?).
At this point I'm really starting to think that maybe we should just
vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an
upper bound :-/
I'll try to think about this a bit more over the weekend.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
  2020-06-13  1:03   ` Stanislav Fomichev
@ 2020-06-13  3:50     ` Alexei Starovoitov
  2020-06-15 16:41       ` Stanislav Fomichev
  0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2020-06-13  3:50 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Netdev, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
	David Laight

On Fri, Jun 12, 2020 at 06:03:03PM -0700, Stanislav Fomichev wrote:
> On Fri, Jun 12, 2020 at 5:34 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Jun 08, 2020 at 11:27:47AM -0700, Stanislav Fomichev wrote:
> > > Attaching to these hooks can break iptables because its optval is
> > > usually quite big, or at least bigger than the current PAGE_SIZE limit.
> > > David also mentioned some SCTP options can be big (around 256k).
> > >
> > > There are two possible ways to fix it:
> > > 1. Increase the limit to match iptables max optval. There is, however,
> > >    no clear upper limit. Technically, iptables can accept up to
> > >    512M of data (not sure how practical it is though).
> > >
> > > 2. Bypass the value (don't expose to BPF) if it's too big and trigger
> > >    BPF only with level/optname so BPF can still decide whether
> > >    to allow/deny big sockopts.
> > >
> > > The initial attempt was implemented using strategy #1. Due to
> > > listed shortcomings, let's switch to strategy #2. When there is
> > > legitimate a real use-case for iptables/SCTP, we can consider increasing
> > > the PAGE_SIZE limit.
> > >
> > > v3:
> > > * don't increase the limit, bypass the argument
> > >
> > > v2:
> > > * proper comments formatting (Jakub Kicinski)
> > >
> > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > > Cc: David Laight <David.Laight@ACULAB.COM>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  kernel/bpf/cgroup.c | 18 ++++++++++++++----
> > >  1 file changed, 14 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > > index fdf7836750a3..758082853086 100644
> > > --- a/kernel/bpf/cgroup.c
> > > +++ b/kernel/bpf/cgroup.c
> > > @@ -1276,9 +1276,18 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
> > >
> > >  static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
> > >  {
> > > -     if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> > > +     if (unlikely(max_optlen < 0))
> > >               return -EINVAL;
> > >
> > > +     if (unlikely(max_optlen > PAGE_SIZE)) {
> > > +             /* We don't expose optvals that are greater than PAGE_SIZE
> > > +              * to the BPF program.
> > > +              */
> > > +             ctx->optval = NULL;
> > > +             ctx->optval_end = NULL;
> > > +             return 0;
> > > +     }
> >
> > It's probably ok, but makes me uneasy about verifier consequences.
> > ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
> > Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
> > I don't think we have such tests. I guess bpf prog won't be able to read
> > anything and nothing will crash, but having PTR_TO_PACKET that is
> > actually NULL would be an odd special case to keep in mind for everyone
> > who will work on the verifier from now on.
> >
> > Also consider bpf prog that simply reads something small like 4 bytes.
> > IP_FREEBIND sockopt (like your selftest in the patch 2) will have
> > those 4 bytes, so it's natural for the prog to assume that it can read it.
> > It will have
> > p = ctx->optval;
> > if (p + 4 > ctx->optval_end)
> >  /* goto out and don't bother logging, since that never happens */
> > *(u32*)p;
> >
> > but 'clever' user space would pass long optlen and prog suddenly
> > 'not seeing' the sockopt. It didn't crash, but debugging would be
> > surprising.
> >
> > I feel it's better to copy the first 4k and let the program see it.
> Agreed with the IP_FREEBIND example wrt observability, however it's
> not clear what to do with the cropped buffer if the bpf program
> modifies it.
> 
> Consider that huge iptables setsockopts where the usespace passes
> PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it.
> Now, if the bpf program modifies the buffer (say, flips some byte), we
> are back to square one. We either have to silently discard that buffer
> or reallocate/merge. My reasoning with data == NULL, is that at least
> there is a clear signal that the program can't access the data (and
> can look at optlen to see if the original buffer is indeed non-zero
> and maybe deny such requests?).
> At this point I'm really starting to think that maybe we should just
> vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an
> upper bound :-/
> I'll try to think about this a bit more over the weekend.

Yeah. Tough choices.
We can also detect in the verifier whether program accessed ctx->optval
and skip alloc/copy if program didn't touch it, but I suspect in most
case the program would want to read it.
I think vmallocing what optlen said is DoS-able. It's better to
stick with single page.
Let's keep brainstorming.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
  2020-06-13  3:50     ` Alexei Starovoitov
@ 2020-06-15 16:41       ` Stanislav Fomichev
  2020-06-16 23:03         ` Alexei Starovoitov
  0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Fomichev @ 2020-06-15 16:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Netdev, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
	David Laight

On Fri, Jun 12, 2020 at 8:50 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
[ .. ]
> > > It's probably ok, but makes me uneasy about verifier consequences.
> > > ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
> > > Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
> > > I don't think we have such tests. I guess bpf prog won't be able to read
> > > anything and nothing will crash, but having PTR_TO_PACKET that is
> > > actually NULL would be an odd special case to keep in mind for everyone
> > > who will work on the verifier from now on.
> > >
> > > Also consider bpf prog that simply reads something small like 4 bytes.
> > > IP_FREEBIND sockopt (like your selftest in the patch 2) will have
> > > those 4 bytes, so it's natural for the prog to assume that it can read it.
> > > It will have
> > > p = ctx->optval;
> > > if (p + 4 > ctx->optval_end)
> > >  /* goto out and don't bother logging, since that never happens */
> > > *(u32*)p;
> > >
> > > but 'clever' user space would pass long optlen and prog suddenly
> > > 'not seeing' the sockopt. It didn't crash, but debugging would be
> > > surprising.
> > >
> > > I feel it's better to copy the first 4k and let the program see it.
> > Agreed with the IP_FREEBIND example wrt observability, however it's
> > not clear what to do with the cropped buffer if the bpf program
> > modifies it.
> >
> > Consider that huge iptables setsockopts where the usespace passes
> > PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it.
> > Now, if the bpf program modifies the buffer (say, flips some byte), we
> > are back to square one. We either have to silently discard that buffer
> > or reallocate/merge. My reasoning with data == NULL, is that at least
> > there is a clear signal that the program can't access the data (and
> > can look at optlen to see if the original buffer is indeed non-zero
> > and maybe deny such requests?).
> > At this point I'm really starting to think that maybe we should just
> > vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an
> > upper bound :-/
> > I'll try to think about this a bit more over the weekend.
>
> Yeah. Tough choices.
> We can also detect in the verifier whether program accessed ctx->optval
> and skip alloc/copy if program didn't touch it, but I suspect in most
> case the program would want to read it.
> I think vmallocing what optlen said is DoS-able. It's better to
> stick with single page.
> Let's keep brainstorming.
Btw, can we use sleepable bpf for that? As in, do whatever I suggested
in these patches (don't expose optval>PAGE_SIZE via context), but add
a new helper where you can say 'copy x bytes from y offset of the
original optval' (the helper will do sleepable copy_form_user).
That way we have a clean signal to the BPF that the value is too big
(optval==optval_end==NULL) and the user can fallback to the helper to
inspect the value. We can also provide another helper to export new
value for this case.
WDYT?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
  2020-06-15 16:41       ` Stanislav Fomichev
@ 2020-06-16 23:03         ` Alexei Starovoitov
  2020-06-16 23:20           ` Stanislav Fomichev
  0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2020-06-16 23:03 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Netdev, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
	David Laight

On Mon, Jun 15, 2020 at 09:41:38AM -0700, Stanislav Fomichev wrote:
> On Fri, Jun 12, 2020 at 8:50 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> [ .. ]
> > > > It's probably ok, but makes me uneasy about verifier consequences.
> > > > ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
> > > > Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
> > > > I don't think we have such tests. I guess bpf prog won't be able to read
> > > > anything and nothing will crash, but having PTR_TO_PACKET that is
> > > > actually NULL would be an odd special case to keep in mind for everyone
> > > > who will work on the verifier from now on.
> > > >
> > > > Also consider bpf prog that simply reads something small like 4 bytes.
> > > > IP_FREEBIND sockopt (like your selftest in the patch 2) will have
> > > > those 4 bytes, so it's natural for the prog to assume that it can read it.
> > > > It will have
> > > > p = ctx->optval;
> > > > if (p + 4 > ctx->optval_end)
> > > >  /* goto out and don't bother logging, since that never happens */
> > > > *(u32*)p;
> > > >
> > > > but 'clever' user space would pass long optlen and prog suddenly
> > > > 'not seeing' the sockopt. It didn't crash, but debugging would be
> > > > surprising.
> > > >
> > > > I feel it's better to copy the first 4k and let the program see it.
> > > Agreed with the IP_FREEBIND example wrt observability, however it's
> > > not clear what to do with the cropped buffer if the bpf program
> > > modifies it.
> > >
> > > Consider that huge iptables setsockopts where the usespace passes
> > > PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it.
> > > Now, if the bpf program modifies the buffer (say, flips some byte), we
> > > are back to square one. We either have to silently discard that buffer
> > > or reallocate/merge. My reasoning with data == NULL, is that at least
> > > there is a clear signal that the program can't access the data (and
> > > can look at optlen to see if the original buffer is indeed non-zero
> > > and maybe deny such requests?).
> > > At this point I'm really starting to think that maybe we should just
> > > vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an
> > > upper bound :-/
> > > I'll try to think about this a bit more over the weekend.
> >
> > Yeah. Tough choices.
> > We can also detect in the verifier whether program accessed ctx->optval
> > and skip alloc/copy if program didn't touch it, but I suspect in most
> > case the program would want to read it.
> > I think vmallocing what optlen said is DoS-able. It's better to
> > stick with single page.
> > Let's keep brainstorming.
> Btw, can we use sleepable bpf for that? As in, do whatever I suggested
> in these patches (don't expose optval>PAGE_SIZE via context), but add
> a new helper where you can say 'copy x bytes from y offset of the
> original optval' (the helper will do sleepable copy_form_user).
> That way we have a clean signal to the BPF that the value is too big
> (optval==optval_end==NULL) and the user can fallback to the helper to
> inspect the value. We can also provide another helper to export new
> value for this case.

sleepable will be read-only and with toctou.
I guess this patch is the least evil then ?
But I'm confused with the test in patch 2.
Why does it do 'if (optval > optval_end)' ?
How is that possible when patch 1 makes them equal.
may be another idea:
allocate 4k, copy first 4k into it, but keep ctx.optlen as original.
if bpf prog reads optval and finds it ok in setsockopt,
it can set ctx.optlen = 0
which would mean run the rest of setsockopt handling with original
'__user *optval' and ignore the buffer that was passed to bpf prog.
In case of ctx.optlen < 4k the behavior won't change from bpf prog
and from kernel pov.
When ctx.optlen > 4k and prog didn't adjust it
__cgroup_bpf_run_filter_setsockopt will do ret = -EFAULT;
and reject sockopt.
So bpf prog would be force to do something with large optvals.
yet it will be able to examine first 4k bytes of it.
It's a bit of change in behavior, but I don't think bpf prog is
doing ctx.optlen = 0, since that's more or less the same as
doing ctx.optlen = -2.
or may be use some other indicator ?
And do something similar with getsockopt.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
  2020-06-16 23:03         ` Alexei Starovoitov
@ 2020-06-16 23:20           ` Stanislav Fomichev
  0 siblings, 0 replies; 8+ messages in thread
From: Stanislav Fomichev @ 2020-06-16 23:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Netdev, bpf, David Miller, Alexei Starovoitov, Daniel Borkmann,
	David Laight

On Tue, Jun 16, 2020 at 4:04 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jun 15, 2020 at 09:41:38AM -0700, Stanislav Fomichev wrote:
> > On Fri, Jun 12, 2020 at 8:50 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > [ .. ]
> > > > > It's probably ok, but makes me uneasy about verifier consequences.
> > > > > ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
> > > > > Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
> > > > > I don't think we have such tests. I guess bpf prog won't be able to read
> > > > > anything and nothing will crash, but having PTR_TO_PACKET that is
> > > > > actually NULL would be an odd special case to keep in mind for everyone
> > > > > who will work on the verifier from now on.
> > > > >
> > > > > Also consider bpf prog that simply reads something small like 4 bytes.
> > > > > IP_FREEBIND sockopt (like your selftest in the patch 2) will have
> > > > > those 4 bytes, so it's natural for the prog to assume that it can read it.
> > > > > It will have
> > > > > p = ctx->optval;
> > > > > if (p + 4 > ctx->optval_end)
> > > > >  /* goto out and don't bother logging, since that never happens */
> > > > > *(u32*)p;
> > > > >
> > > > > but 'clever' user space would pass long optlen and prog suddenly
> > > > > 'not seeing' the sockopt. It didn't crash, but debugging would be
> > > > > surprising.
> > > > >
> > > > > I feel it's better to copy the first 4k and let the program see it.
> > > > Agreed with the IP_FREEBIND example wrt observability, however it's
> > > > not clear what to do with the cropped buffer if the bpf program
> > > > modifies it.
> > > >
> > > > Consider that huge iptables setsockopts where the usespace passes
> > > > PAGE_SIZE*10 optlen with real data and bpf prog sees only part of it.
> > > > Now, if the bpf program modifies the buffer (say, flips some byte), we
> > > > are back to square one. We either have to silently discard that buffer
> > > > or reallocate/merge. My reasoning with data == NULL, is that at least
> > > > there is a clear signal that the program can't access the data (and
> > > > can look at optlen to see if the original buffer is indeed non-zero
> > > > and maybe deny such requests?).
> > > > At this point I'm really starting to think that maybe we should just
> > > > vmalloc everything that is >PAGE_SIZE and add a sysclt to limit an
> > > > upper bound :-/
> > > > I'll try to think about this a bit more over the weekend.
> > >
> > > Yeah. Tough choices.
> > > We can also detect in the verifier whether program accessed ctx->optval
> > > and skip alloc/copy if program didn't touch it, but I suspect in most
> > > case the program would want to read it.
> > > I think vmallocing what optlen said is DoS-able. It's better to
> > > stick with single page.
> > > Let's keep brainstorming.
> > Btw, can we use sleepable bpf for that? As in, do whatever I suggested
> > in these patches (don't expose optval>PAGE_SIZE via context), but add
> > a new helper where you can say 'copy x bytes from y offset of the
> > original optval' (the helper will do sleepable copy_form_user).
> > That way we have a clean signal to the BPF that the value is too big
> > (optval==optval_end==NULL) and the user can fallback to the helper to
> > inspect the value. We can also provide another helper to export new
> > value for this case.
>
> sleepable will be read-only and with toctou.
> I guess this patch is the least evil then ?
> But I'm confused with the test in patch 2.
> Why does it do 'if (optval > optval_end)' ?
> How is that possible when patch 1 makes them equal.
Right, it should really be 'optval < optval_end', good point!
I want to make sure in the BPF program that we can't access the buffer
(essentially return EPERM to the userpace so the test fails).
Will fix in a follow up.

> may be another idea:
> allocate 4k, copy first 4k into it, but keep ctx.optlen as original.
> if bpf prog reads optval and finds it ok in setsockopt,
> it can set ctx.optlen = 0
> which would mean run the rest of setsockopt handling with original
> '__user *optval' and ignore the buffer that was passed to bpf prog.
> In case of ctx.optlen < 4k the behavior won't change from bpf prog
> and from kernel pov.
> When ctx.optlen > 4k and prog didn't adjust it
> __cgroup_bpf_run_filter_setsockopt will do ret = -EFAULT;
> and reject sockopt.
> So bpf prog would be force to do something with large optvals.
> yet it will be able to examine first 4k bytes of it.
> It's a bit of change in behavior, but I don't think bpf prog is
> doing ctx.optlen = 0, since that's more or less the same as
> doing ctx.optlen = -2.
> or may be use some other indicator ?
> And do something similar with getsockopt.
Good suggestion! That sounds doable in theory, let me try and see if I
hit something unexpected.
I suppose trimming provided optval to zero (optlen=0) is not something
that sensible programs would do?

In this case please ignore the v4 I posted recently!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-16 23:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-08 18:27 [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev
2020-06-08 18:27 ` [PATCH bpf v3 2/2] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
2020-06-13  0:33 ` [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
2020-06-13  1:03   ` Stanislav Fomichev
2020-06-13  3:50     ` Alexei Starovoitov
2020-06-15 16:41       ` Stanislav Fomichev
2020-06-16 23:03         ` Alexei Starovoitov
2020-06-16 23:20           ` Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).