linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] seccomp: allow BPF_MOD ALU instructions
@ 2020-03-16 16:36 Anton Protopopov
  2020-03-16 21:23 ` Kees Cook
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Protopopov @ 2020-03-16 16:36 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Will Drewry, linux-kernel
  Cc: Anton Protopopov, Daniel Borkmann, bpf

The BPF_MOD ALU instructions could be utilized by seccomp classic BPF filters,
but were missing from the explicit list of allowed calls since its introduction
in the original e2cfabdfd075 ("seccomp: add system call filtering using BPF")
commit.  Add support for these instructions by adding them to the allowed list
in the seccomp_check_filter function.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 kernel/seccomp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index b6ea3dcb57bf..cae7561b44d4 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -206,6 +206,8 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
 		case BPF_ALU | BPF_MUL | BPF_X:
 		case BPF_ALU | BPF_DIV | BPF_K:
 		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_X:
 		case BPF_ALU | BPF_AND | BPF_K:
 		case BPF_ALU | BPF_AND | BPF_X:
 		case BPF_ALU | BPF_OR | BPF_K:
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] seccomp: allow BPF_MOD ALU instructions
  2020-03-16 16:36 [PATCH] seccomp: allow BPF_MOD ALU instructions Anton Protopopov
@ 2020-03-16 21:23 ` Kees Cook
  2020-03-16 22:17   ` Anton Protopopov
  0 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2020-03-16 21:23 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Andy Lutomirski, Will Drewry, linux-kernel, Daniel Borkmann, bpf

On Mon, Mar 16, 2020 at 04:36:46PM +0000, Anton Protopopov wrote:
> The BPF_MOD ALU instructions could be utilized by seccomp classic BPF filters,
> but were missing from the explicit list of allowed calls since its introduction
> in the original e2cfabdfd075 ("seccomp: add system call filtering using BPF")
> commit.  Add support for these instructions by adding them to the allowed list
> in the seccomp_check_filter function.
> 
> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>

This has been suggested in the past, but was deemed ultimately redundant:
https://lore.kernel.org/lkml/201908121035.06695C79F@keescook/

Is there a strong reason it's needed?

Thanks!

-Kees

> ---
>  kernel/seccomp.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index b6ea3dcb57bf..cae7561b44d4 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -206,6 +206,8 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
>  		case BPF_ALU | BPF_MUL | BPF_X:
>  		case BPF_ALU | BPF_DIV | BPF_K:
>  		case BPF_ALU | BPF_DIV | BPF_X:
> +		case BPF_ALU | BPF_MOD | BPF_K:
> +		case BPF_ALU | BPF_MOD | BPF_X:
>  		case BPF_ALU | BPF_AND | BPF_K:
>  		case BPF_ALU | BPF_AND | BPF_X:
>  		case BPF_ALU | BPF_OR | BPF_K:
> -- 
> 2.19.1

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] seccomp: allow BPF_MOD ALU instructions
  2020-03-16 21:23 ` Kees Cook
@ 2020-03-16 22:17   ` Anton Protopopov
  2020-03-17 20:20     ` Kees Cook
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Protopopov @ 2020-03-16 22:17 UTC (permalink / raw)
  To: Kees Cook; +Cc: Andy Lutomirski, Will Drewry, open list, Daniel Borkmann, bpf

пн, 16 мар. 2020 г. в 17:24, Kees Cook <keescook@chromium.org>:
>
> On Mon, Mar 16, 2020 at 04:36:46PM +0000, Anton Protopopov wrote:
> > The BPF_MOD ALU instructions could be utilized by seccomp classic BPF filters,
> > but were missing from the explicit list of allowed calls since its introduction
> > in the original e2cfabdfd075 ("seccomp: add system call filtering using BPF")
> > commit.  Add support for these instructions by adding them to the allowed list
> > in the seccomp_check_filter function.
> >
> > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
>
> This has been suggested in the past, but was deemed ultimately redundant:
> https://lore.kernel.org/lkml/201908121035.06695C79F@keescook/

Yeah, Paul told me this right after I submitted the patch.

> Is there a strong reason it's needed?

I really don't have such a strong need in BPF_MOD, but let me tell why
I wanted to use it in the first place.

I've used this operation to speedup processing linear blacklist
filters. Namely, if you have a list of syscall numbers to blacklist,
you can do, say,

ldw [0]
mod #4
jeq #1, case1
jeq #1, case2
jeq #1, case3
case0:
...

and in every case to walk only a corresponding factor-list. In my case
I had a list of ~40 syscall numbers and after this change filter
executed in 17.25 instructions on average per syscall vs. 45
instructions for the linear filter (so this removes about 30
instructions penalty per every syscall). To replace "mod #4" I
actually used "and #3", but this obviously doesn't work for
non-power-of-two divisors. If I would use "mod 5", then it would give
me about 15.5 instructions on average.

Thanks,
Anton

>
> Thanks!
>
> -Kees
>
> > ---
> >  kernel/seccomp.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index b6ea3dcb57bf..cae7561b44d4 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -206,6 +206,8 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
> >               case BPF_ALU | BPF_MUL | BPF_X:
> >               case BPF_ALU | BPF_DIV | BPF_K:
> >               case BPF_ALU | BPF_DIV | BPF_X:
> > +             case BPF_ALU | BPF_MOD | BPF_K:
> > +             case BPF_ALU | BPF_MOD | BPF_X:
> >               case BPF_ALU | BPF_AND | BPF_K:
> >               case BPF_ALU | BPF_AND | BPF_X:
> >               case BPF_ALU | BPF_OR | BPF_K:
> > --
> > 2.19.1
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] seccomp: allow BPF_MOD ALU instructions
  2020-03-16 22:17   ` Anton Protopopov
@ 2020-03-17 20:20     ` Kees Cook
  2020-03-18  1:11       ` Anton Protopopov
  0 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2020-03-17 20:20 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Andy Lutomirski, Will Drewry, open list, Daniel Borkmann, bpf

On Mon, Mar 16, 2020 at 06:17:34PM -0400, Anton Protopopov wrote:
> and in every case to walk only a corresponding factor-list. In my case
> I had a list of ~40 syscall numbers and after this change filter
> executed in 17.25 instructions on average per syscall vs. 45
> instructions for the linear filter (so this removes about 30
> instructions penalty per every syscall). To replace "mod #4" I
> actually used "and #3", but this obviously doesn't work for
> non-power-of-two divisors. If I would use "mod 5", then it would give
> me about 15.5 instructions on average.

Gotcha. My real concern is with breaking the ABI here -- using BPF_MOD
would mean a process couldn't run on older kernels without some tricks
on the seccomp side.

Since the syscall list is static for a given filter, why not arrange it
as a binary search? That should get even better average instructions
as O(log n) instead of O(n).

Though frankly I've also been considering an ABI version bump for adding
a syscall bitmap feature: the vast majority of seccomp filters are just
binary yes/no across a list of syscalls. Only the special cases need
special handling (arg inspection, fd notification, etc). Then these
kinds of filters could run as O(1).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] seccomp: allow BPF_MOD ALU instructions
  2020-03-17 20:20     ` Kees Cook
@ 2020-03-18  1:11       ` Anton Protopopov
  2020-03-18  4:06         ` Kees Cook
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Protopopov @ 2020-03-18  1:11 UTC (permalink / raw)
  To: Kees Cook; +Cc: Andy Lutomirski, Will Drewry, open list, Daniel Borkmann, bpf

вт, 17 мар. 2020 г. в 16:21, Kees Cook <keescook@chromium.org>:
>
> On Mon, Mar 16, 2020 at 06:17:34PM -0400, Anton Protopopov wrote:
> > and in every case to walk only a corresponding factor-list. In my case
> > I had a list of ~40 syscall numbers and after this change filter
> > executed in 17.25 instructions on average per syscall vs. 45
> > instructions for the linear filter (so this removes about 30
> > instructions penalty per every syscall). To replace "mod #4" I
> > actually used "and #3", but this obviously doesn't work for
> > non-power-of-two divisors. If I would use "mod 5", then it would give
> > me about 15.5 instructions on average.
>
> Gotcha. My real concern is with breaking the ABI here -- using BPF_MOD
> would mean a process couldn't run on older kernels without some tricks
> on the seccomp side.

Yes, I understood. Could you tell what would you do exactly if there
was a real need in a new instruction?

> Since the syscall list is static for a given filter, why not arrange it
> as a binary search? That should get even better average instructions
> as O(log n) instead of O(n).

Right, thanks! This saves about 4 more instructions for my case and
works 1-2 ns faster.

> Though frankly I've also been considering an ABI version bump for adding
> a syscall bitmap feature: the vast majority of seccomp filters are just
> binary yes/no across a list of syscalls. Only the special cases need
> special handling (arg inspection, fd notification, etc). Then these
> kinds of filters could run as O(1).
>
> --
> Kees Cook

Thanks,
Anton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] seccomp: allow BPF_MOD ALU instructions
  2020-03-18  1:11       ` Anton Protopopov
@ 2020-03-18  4:06         ` Kees Cook
  2020-03-18 15:23           ` Anton Protopopov
  0 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2020-03-18  4:06 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Andy Lutomirski, Will Drewry, open list, Daniel Borkmann, bpf

On Tue, Mar 17, 2020 at 09:11:57PM -0400, Anton Protopopov wrote:
> вт, 17 мар. 2020 г. в 16:21, Kees Cook <keescook@chromium.org>:
> >
> > On Mon, Mar 16, 2020 at 06:17:34PM -0400, Anton Protopopov wrote:
> > > and in every case to walk only a corresponding factor-list. In my case
> > > I had a list of ~40 syscall numbers and after this change filter
> > > executed in 17.25 instructions on average per syscall vs. 45
> > > instructions for the linear filter (so this removes about 30
> > > instructions penalty per every syscall). To replace "mod #4" I
> > > actually used "and #3", but this obviously doesn't work for
> > > non-power-of-two divisors. If I would use "mod 5", then it would give
> > > me about 15.5 instructions on average.
> >
> > Gotcha. My real concern is with breaking the ABI here -- using BPF_MOD
> > would mean a process couldn't run on older kernels without some tricks
> > on the seccomp side.
> 
> Yes, I understood. Could you tell what would you do exactly if there
> was a real need in a new instruction?

I'd likely need to introduce some kind of way to query (and declare) the
"language version" of seccomp filters. New programs would need to
declare the language level (EINVAL would mean the program must support
the original "v1", ENOTSUPP would mean "kernel doesn't support that
level"), and the program would have to build a filter based on the
supported language features. The kernel would assume all undeclared
seccomp users were "v1" and would need to reject BPF_MOD. All programs
declaring "v2" would be allowed to use BPF_MOD.

It's really a lot for something that isn't really needed. :)

> > Since the syscall list is static for a given filter, why not arrange it
> > as a binary search? That should get even better average instructions
> > as O(log n) instead of O(n).
> 
> Right, thanks! This saves about 4 more instructions for my case and
> works 1-2 ns faster.

Excellent!

> > Though frankly I've also been considering an ABI version bump for adding
> > a syscall bitmap feature: the vast majority of seccomp filters are just
> > binary yes/no across a list of syscalls. Only the special cases need
> > special handling (arg inspection, fd notification, etc). Then these
> > kinds of filters could run as O(1).

*This* feature wouldn't need my crazy language version idea, but it
_would_ still need to be detectable, much like how RET_USER_NOTIF was
added.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] seccomp: allow BPF_MOD ALU instructions
  2020-03-18  4:06         ` Kees Cook
@ 2020-03-18 15:23           ` Anton Protopopov
  0 siblings, 0 replies; 10+ messages in thread
From: Anton Protopopov @ 2020-03-18 15:23 UTC (permalink / raw)
  To: Kees Cook; +Cc: Andy Lutomirski, Will Drewry, open list, Daniel Borkmann, bpf

ср, 18 мар. 2020 г. в 00:06, Kees Cook <keescook@chromium.org>:
>
> On Tue, Mar 17, 2020 at 09:11:57PM -0400, Anton Protopopov wrote:
> > вт, 17 мар. 2020 г. в 16:21, Kees Cook <keescook@chromium.org>:
> > >
> > > On Mon, Mar 16, 2020 at 06:17:34PM -0400, Anton Protopopov wrote:
> > > > and in every case to walk only a corresponding factor-list. In my case
> > > > I had a list of ~40 syscall numbers and after this change filter
> > > > executed in 17.25 instructions on average per syscall vs. 45
> > > > instructions for the linear filter (so this removes about 30
> > > > instructions penalty per every syscall). To replace "mod #4" I
> > > > actually used "and #3", but this obviously doesn't work for
> > > > non-power-of-two divisors. If I would use "mod 5", then it would give
> > > > me about 15.5 instructions on average.
> > >
> > > Gotcha. My real concern is with breaking the ABI here -- using BPF_MOD
> > > would mean a process couldn't run on older kernels without some tricks
> > > on the seccomp side.
> >
> > Yes, I understood. Could you tell what would you do exactly if there
> > was a real need in a new instruction?
>
> I'd likely need to introduce some kind of way to query (and declare) the
> "language version" of seccomp filters. New programs would need to
> declare the language level (EINVAL would mean the program must support
> the original "v1", ENOTSUPP would mean "kernel doesn't support that
> level"), and the program would have to build a filter based on the
> supported language features. The kernel would assume all undeclared
> seccomp users were "v1" and would need to reject BPF_MOD. All programs
> declaring "v2" would be allowed to use BPF_MOD.
>
> It's really a lot for something that isn't really needed. :)

Right :) Thanks for the explanations!

> > > Since the syscall list is static for a given filter, why not arrange it
> > > as a binary search? That should get even better average instructions
> > > as O(log n) instead of O(n).
> >
> > Right, thanks! This saves about 4 more instructions for my case and
> > works 1-2 ns faster.
>
> Excellent!
>
> > > Though frankly I've also been considering an ABI version bump for adding
> > > a syscall bitmap feature: the vast majority of seccomp filters are just
> > > binary yes/no across a list of syscalls. Only the special cases need
> > > special handling (arg inspection, fd notification, etc). Then these
> > > kinds of filters could run as O(1).
>
> *This* feature wouldn't need my crazy language version idea, but it
> _would_ still need to be detectable, much like how RET_USER_NOTIF was
> added.
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] seccomp: allow BPF_MOD ALU instructions
  2019-08-11  8:58 ` Paul Chaignon
@ 2019-08-12 17:38   ` Kees Cook
  0 siblings, 0 replies; 10+ messages in thread
From: Kees Cook @ 2019-08-12 17:38 UTC (permalink / raw)
  To: Paul Chaignon; +Cc: Andy Lutomirski, Will Drewry, linux-kernel

On Sun, Aug 11, 2019 at 10:58:33AM +0200, Paul Chaignon wrote:
> On Fri, Aug 9, 2019 at 8:26 PM Paul Chaignon <paul.chaignon@orange.com> wrote:
> >
> > We need BPF_MOD to match system calls against whitelists encoded as 32-bit
> > bit arrays.  The selection of the syscall's bit in the appropriate bit
> > array requires a modulo operation such that X = 1 << nr % 32.
> 
> Of course, X = 1 << nr & 0x1F, and we can do without BPF_MOD in our case.
> I'll put that on a lack of sleep...

No worries! Changing the dialect of seccomp BPF isn't something I'd like
to do without really good reason since it creates a split in the filter
correctness from userspace (i.e. a filter using BPF_MOD on an older
kernel will fail). So there would need to be a distinct flag set
somewhere, etc. So, if you do end up discovering later you really want
BPF_MOD, we can figure that out, but for now if you can get by with "&",
that would be best. :)

Thanks!

-Kees

> 
> >
> > Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
> > ---
> >  kernel/seccomp.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index 811b4a86cdf6..87de6532ff6d 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -205,6 +205,8 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
> >                 case BPF_ALU | BPF_MUL | BPF_X:
> >                 case BPF_ALU | BPF_DIV | BPF_K:
> >                 case BPF_ALU | BPF_DIV | BPF_X:
> > +               case BPF_ALU | BPF_MOD | BPF_K:
> > +               case BPF_ALU | BPF_MOD | BPF_X:
> >                 case BPF_ALU | BPF_AND | BPF_K:
> >                 case BPF_ALU | BPF_AND | BPF_X:
> >                 case BPF_ALU | BPF_OR | BPF_K:
> > --
> > 2.17.1

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] seccomp: allow BPF_MOD ALU instructions
  2019-08-09 18:26 Paul Chaignon
@ 2019-08-11  8:58 ` Paul Chaignon
  2019-08-12 17:38   ` Kees Cook
  0 siblings, 1 reply; 10+ messages in thread
From: Paul Chaignon @ 2019-08-11  8:58 UTC (permalink / raw)
  To: Kees Cook; +Cc: Andy Lutomirski, Will Drewry, linux-kernel

On Fri, Aug 9, 2019 at 8:26 PM Paul Chaignon <paul.chaignon@orange.com> wrote:
>
> We need BPF_MOD to match system calls against whitelists encoded as 32-bit
> bit arrays.  The selection of the syscall's bit in the appropriate bit
> array requires a modulo operation such that X = 1 << nr % 32.

Of course, X = 1 << nr & 0x1F, and we can do without BPF_MOD in our case.
I'll put that on a lack of sleep...

>
> Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
> ---
>  kernel/seccomp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 811b4a86cdf6..87de6532ff6d 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -205,6 +205,8 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
>                 case BPF_ALU | BPF_MUL | BPF_X:
>                 case BPF_ALU | BPF_DIV | BPF_K:
>                 case BPF_ALU | BPF_DIV | BPF_X:
> +               case BPF_ALU | BPF_MOD | BPF_K:
> +               case BPF_ALU | BPF_MOD | BPF_X:
>                 case BPF_ALU | BPF_AND | BPF_K:
>                 case BPF_ALU | BPF_AND | BPF_X:
>                 case BPF_ALU | BPF_OR | BPF_K:
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] seccomp: allow BPF_MOD ALU instructions
@ 2019-08-09 18:26 Paul Chaignon
  2019-08-11  8:58 ` Paul Chaignon
  0 siblings, 1 reply; 10+ messages in thread
From: Paul Chaignon @ 2019-08-09 18:26 UTC (permalink / raw)
  To: Kees Cook; +Cc: Andy Lutomirski, Will Drewry, linux-kernel

We need BPF_MOD to match system calls against whitelists encoded as 32-bit
bit arrays.  The selection of the syscall's bit in the appropriate bit
array requires a modulo operation such that X = 1 << nr % 32.

Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
---
 kernel/seccomp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 811b4a86cdf6..87de6532ff6d 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -205,6 +205,8 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
 		case BPF_ALU | BPF_MUL | BPF_X:
 		case BPF_ALU | BPF_DIV | BPF_K:
 		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_X:
 		case BPF_ALU | BPF_AND | BPF_K:
 		case BPF_ALU | BPF_AND | BPF_X:
 		case BPF_ALU | BPF_OR | BPF_K:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-18 15:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-16 16:36 [PATCH] seccomp: allow BPF_MOD ALU instructions Anton Protopopov
2020-03-16 21:23 ` Kees Cook
2020-03-16 22:17   ` Anton Protopopov
2020-03-17 20:20     ` Kees Cook
2020-03-18  1:11       ` Anton Protopopov
2020-03-18  4:06         ` Kees Cook
2020-03-18 15:23           ` Anton Protopopov
  -- strict thread matches above, loose matches on Subject: below --
2019-08-09 18:26 Paul Chaignon
2019-08-11  8:58 ` Paul Chaignon
2019-08-12 17:38   ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).