bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Question about seccomp / bpf
       [not found] <CANn89iL_XLb5C-+DY5PRhneZDJv585xfbLtiEVc3-ejzNNXaVg@mail.gmail.com>
@ 2019-05-08 23:09 ` Alexei Starovoitov
  2019-05-08 23:17   ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2019-05-08 23:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf

On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote:
> Hi Alexei and Daniel
> 
> I have a question about seccomp.
> 
> It seems that after this patch, seccomp no longer needs a helper
> (seccomp_bpf_load())
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8
> 
> Are we detecting that a particular JIT code needs to call at least one
> function from the kernel at all ?

Currently we don't track such things and trying very hard to avoid
any special cases for classic vs extended.

> If the filter contains self-contained code (no call, just inline
> code), then we could use any room in whole vmalloc space,
> not only from the modules (which is something like 2GB total on x86_64)

I believe there was an effort to make bpf progs and other executable things
to be everywhere too, but I lost the track of it.
It's not that hard to tweak x64 jit to emit 64-bit calls to helpers
when delta between call insn and a helper is more than 32-bit that fits
into call insn. iirc there was even such patch floating around.

but what motivated you question? do you see 2GB space being full?!


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about seccomp / bpf
  2019-05-08 23:09 ` Question about seccomp / bpf Alexei Starovoitov
@ 2019-05-08 23:17   ` Eric Dumazet
  2019-05-09  4:47     ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2019-05-08 23:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf, Kees Cook

On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote:
> > Hi Alexei and Daniel
> >
> > I have a question about seccomp.
> >
> > It seems that after this patch, seccomp no longer needs a helper
> > (seccomp_bpf_load())
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8
> >
> > Are we detecting that a particular JIT code needs to call at least one
> > function from the kernel at all ?
>
> Currently we don't track such things and trying very hard to avoid
> any special cases for classic vs extended.
>
> > If the filter contains self-contained code (no call, just inline
> > code), then we could use any room in whole vmalloc space,
> > not only from the modules (which is something like 2GB total on x86_64)
>
> I believe there was an effort to make bpf progs and other executable things
> to be everywhere too, but I lost the track of it.
> It's not that hard to tweak x64 jit to emit 64-bit calls to helpers
> when delta between call insn and a helper is more than 32-bit that fits
> into call insn. iirc there was even such patch floating around.
>
> but what motivated you question? do you see 2GB space being full?!


A customer seems to hit the limit, with about 75,000 threads,
each one having a seccomp filter with 6 pages (plus one guard page
given by vmalloc)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about seccomp / bpf
  2019-05-08 23:17   ` Eric Dumazet
@ 2019-05-09  4:47     ` Alexei Starovoitov
  2019-05-09 10:52       ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2019-05-09  4:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf, Kees Cook, luto, jannh

On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote:
> On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote:
> > > Hi Alexei and Daniel
> > >
> > > I have a question about seccomp.
> > >
> > > It seems that after this patch, seccomp no longer needs a helper
> > > (seccomp_bpf_load())
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8
> > >
> > > Are we detecting that a particular JIT code needs to call at least one
> > > function from the kernel at all ?
> >
> > Currently we don't track such things and trying very hard to avoid
> > any special cases for classic vs extended.
> >
> > > If the filter contains self-contained code (no call, just inline
> > > code), then we could use any room in whole vmalloc space,
> > > not only from the modules (which is something like 2GB total on x86_64)
> >
> > I believe there was an effort to make bpf progs and other executable things
> > to be everywhere too, but I lost the track of it.
> > It's not that hard to tweak x64 jit to emit 64-bit calls to helpers
> > when delta between call insn and a helper is more than 32-bit that fits
> > into call insn. iirc there was even such patch floating around.
> >
> > but what motivated you question? do you see 2GB space being full?!
> 
> 
> A customer seems to hit the limit, with about 75,000 threads,
> each one having a seccomp filter with 6 pages (plus one guard page
> given by vmalloc)

Since cbpf doesn't have "fd as a program" concept I suspect
the same program was loaded 75k times. What a waste of kernel memory.
And, no, we're not going to extend or fix cbpf for this.
cbpf is frozen. seccomp needs to start using ebpf.
It can have one program to secure all threads.
If necessary single program can be customized via bpf maps
for each thread.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about seccomp / bpf
  2019-05-09  4:47     ` Alexei Starovoitov
@ 2019-05-09 10:52       ` Eric Dumazet
  2019-05-09 10:58         ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2019-05-09 10:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf, Kees Cook,
	Andy Lutomirski, Jann Horn

On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote:
> > On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote:
> > > > Hi Alexei and Daniel
> > > >
> > > > I have a question about seccomp.
> > > >
> > > > It seems that after this patch, seccomp no longer needs a helper
> > > > (seccomp_bpf_load())
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8
> > > >
> > > > Are we detecting that a particular JIT code needs to call at least one
> > > > function from the kernel at all ?
> > >
> > > Currently we don't track such things and trying very hard to avoid
> > > any special cases for classic vs extended.
> > >
> > > > If the filter contains self-contained code (no call, just inline
> > > > code), then we could use any room in whole vmalloc space,
> > > > not only from the modules (which is something like 2GB total on x86_64)
> > >
> > > I believe there was an effort to make bpf progs and other executable things
> > > to be everywhere too, but I lost the track of it.
> > > It's not that hard to tweak x64 jit to emit 64-bit calls to helpers
> > > when delta between call insn and a helper is more than 32-bit that fits
> > > into call insn. iirc there was even such patch floating around.
> > >
> > > but what motivated you question? do you see 2GB space being full?!
> >
> >
> > A customer seems to hit the limit, with about 75,000 threads,
> > each one having a seccomp filter with 6 pages (plus one guard page
> > given by vmalloc)
>
> Since cbpf doesn't have "fd as a program" concept I suspect
> the same program was loaded 75k times. What a waste of kernel memory.
> And, no, we're not going to extend or fix cbpf for this.
> cbpf is frozen. seccomp needs to start using ebpf.
> It can have one program to secure all threads.
> If necessary single program can be customized via bpf maps
> for each thread.

Yes,  docker seems to have a very generic implementation and  should
probably be fixed
( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go )

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about seccomp / bpf
  2019-05-09 10:52       ` Eric Dumazet
@ 2019-05-09 10:58         ` Eric Dumazet
  2019-05-09 11:49           ` Daniel Borkmann
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2019-05-09 10:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf, Kees Cook,
	Andy Lutomirski, Jann Horn, Will Drewry

On Thu, May 9, 2019 at 3:52 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote:
> > > On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote:
> > > > > Hi Alexei and Daniel
> > > > >
> > > > > I have a question about seccomp.
> > > > >
> > > > > It seems that after this patch, seccomp no longer needs a helper
> > > > > (seccomp_bpf_load())
> > > > >
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8
> > > > >
> > > > > Are we detecting that a particular JIT code needs to call at least one
> > > > > function from the kernel at all ?
> > > >
> > > > Currently we don't track such things and trying very hard to avoid
> > > > any special cases for classic vs extended.
> > > >
> > > > > If the filter contains self-contained code (no call, just inline
> > > > > code), then we could use any room in whole vmalloc space,
> > > > > not only from the modules (which is something like 2GB total on x86_64)
> > > >
> > > > I believe there was an effort to make bpf progs and other executable things
> > > > to be everywhere too, but I lost the track of it.
> > > > It's not that hard to tweak x64 jit to emit 64-bit calls to helpers
> > > > when delta between call insn and a helper is more than 32-bit that fits
> > > > into call insn. iirc there was even such patch floating around.
> > > >
> > > > but what motivated you question? do you see 2GB space being full?!
> > >
> > >
> > > A customer seems to hit the limit, with about 75,000 threads,
> > > each one having a seccomp filter with 6 pages (plus one guard page
> > > given by vmalloc)
> >
> > Since cbpf doesn't have "fd as a program" concept I suspect
> > the same program was loaded 75k times. What a waste of kernel memory.
> > And, no, we're not going to extend or fix cbpf for this.
> > cbpf is frozen. seccomp needs to start using ebpf.
> > It can have one program to secure all threads.
> > If necessary single program can be customized via bpf maps
> > for each thread.
>
> Yes,  docker seems to have a very generic implementation and  should
> probably be fixed
> ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go )

Even if the seccomp program was optimized to a few bytes, it would
still consume at least 2 pages in module vmalloc space,
so the limit in number of concurrent programs would be around 262,144

We might ask seccomp guys to detect that the same program is used, by
maintaining a hash of already loaded ones.
( I see struct seccomp_filter has a @usage refcount_t )

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about seccomp / bpf
  2019-05-09 10:58         ` Eric Dumazet
@ 2019-05-09 11:49           ` Daniel Borkmann
  2019-05-09 23:30             ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Borkmann @ 2019-05-09 11:49 UTC (permalink / raw)
  To: Eric Dumazet, Alexei Starovoitov
  Cc: Alexei Starovoitov, netdev, bpf, Kees Cook, Andy Lutomirski,
	Jann Horn, Will Drewry

On 05/09/2019 12:58 PM, Eric Dumazet wrote:
> On Thu, May 9, 2019 at 3:52 AM Eric Dumazet <edumazet@google.com> wrote:
>> On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>> On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote:
>>>> On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov
>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>> On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote:
>>>>>> Hi Alexei and Daniel
>>>>>>
>>>>>> I have a question about seccomp.
>>>>>>
>>>>>> It seems that after this patch, seccomp no longer needs a helper
>>>>>> (seccomp_bpf_load())
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8
>>>>>>
>>>>>> Are we detecting that a particular JIT code needs to call at least one
>>>>>> function from the kernel at all ?
>>>>>
>>>>> Currently we don't track such things and trying very hard to avoid
>>>>> any special cases for classic vs extended.
>>>>>
>>>>>> If the filter contains self-contained code (no call, just inline
>>>>>> code), then we could use any room in whole vmalloc space,
>>>>>> not only from the modules (which is something like 2GB total on x86_64)
>>>>>
>>>>> I believe there was an effort to make bpf progs and other executable things
>>>>> to be everywhere too, but I lost the track of it.
>>>>> It's not that hard to tweak x64 jit to emit 64-bit calls to helpers
>>>>> when delta between call insn and a helper is more than 32-bit that fits
>>>>> into call insn. iirc there was even such patch floating around.
>>>>>
>>>>> but what motivated you question? do you see 2GB space being full?!
>>>>
>>>> A customer seems to hit the limit, with about 75,000 threads,
>>>> each one having a seccomp filter with 6 pages (plus one guard page
>>>> given by vmalloc)
>>>
>>> Since cbpf doesn't have "fd as a program" concept I suspect
>>> the same program was loaded 75k times. What a waste of kernel memory.
>>> And, no, we're not going to extend or fix cbpf for this.
>>> cbpf is frozen. seccomp needs to start using ebpf.
>>> It can have one program to secure all threads.
>>> If necessary single program can be customized via bpf maps
>>> for each thread.
>>
>> Yes,  docker seems to have a very generic implementation and  should
>> probably be fixed
>> ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go )
> 
> Even if the seccomp program was optimized to a few bytes, it would
> still consume at least 2 pages in module vmalloc space,
> so the limit in number of concurrent programs would be around 262,144
> 
> We might ask seccomp guys to detect that the same program is used, by
> maintaining a hash of already loaded ones.
> ( I see struct seccomp_filter has a @usage refcount_t )

+1, that would indeed be worth to pursue as a short term solution.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about seccomp / bpf
  2019-05-09 11:49           ` Daniel Borkmann
@ 2019-05-09 23:30             ` Alexei Starovoitov
  2019-05-09 23:49               ` Kees Cook
  2019-05-09 23:50               ` Eric Dumazet
  0 siblings, 2 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2019-05-09 23:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Eric Dumazet, Alexei Starovoitov, netdev, bpf, Kees Cook,
	Andy Lutomirski, Jann Horn, Will Drewry

On Thu, May 09, 2019 at 01:49:25PM +0200, Daniel Borkmann wrote:
> On 05/09/2019 12:58 PM, Eric Dumazet wrote:
> > On Thu, May 9, 2019 at 3:52 AM Eric Dumazet <edumazet@google.com> wrote:
> >> On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >>> On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote:
> >>>> On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov
> >>>> <alexei.starovoitov@gmail.com> wrote:
> >>>>> On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote:
> >>>>>> Hi Alexei and Daniel
> >>>>>>
> >>>>>> I have a question about seccomp.
> >>>>>>
> >>>>>> It seems that after this patch, seccomp no longer needs a helper
> >>>>>> (seccomp_bpf_load())
> >>>>>>
> >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8
> >>>>>>
> >>>>>> Are we detecting that a particular JIT code needs to call at least one
> >>>>>> function from the kernel at all ?
> >>>>>
> >>>>> Currently we don't track such things and trying very hard to avoid
> >>>>> any special cases for classic vs extended.
> >>>>>
> >>>>>> If the filter contains self-contained code (no call, just inline
> >>>>>> code), then we could use any room in whole vmalloc space,
> >>>>>> not only from the modules (which is something like 2GB total on x86_64)
> >>>>>
> >>>>> I believe there was an effort to make bpf progs and other executable things
> >>>>> to be everywhere too, but I lost the track of it.
> >>>>> It's not that hard to tweak x64 jit to emit 64-bit calls to helpers
> >>>>> when delta between call insn and a helper is more than 32-bit that fits
> >>>>> into call insn. iirc there was even such patch floating around.
> >>>>>
> >>>>> but what motivated you question? do you see 2GB space being full?!
> >>>>
> >>>> A customer seems to hit the limit, with about 75,000 threads,
> >>>> each one having a seccomp filter with 6 pages (plus one guard page
> >>>> given by vmalloc)
> >>>
> >>> Since cbpf doesn't have "fd as a program" concept I suspect
> >>> the same program was loaded 75k times. What a waste of kernel memory.
> >>> And, no, we're not going to extend or fix cbpf for this.
> >>> cbpf is frozen. seccomp needs to start using ebpf.
> >>> It can have one program to secure all threads.
> >>> If necessary single program can be customized via bpf maps
> >>> for each thread.
> >>
> >> Yes,  docker seems to have a very generic implementation and  should
> >> probably be fixed
> >> ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go )
> > 
> > Even if the seccomp program was optimized to a few bytes, it would
> > still consume at least 2 pages in module vmalloc space,
> > so the limit in number of concurrent programs would be around 262,144
> > 
> > We might ask seccomp guys to detect that the same program is used, by
> > maintaining a hash of already loaded ones.
> > ( I see struct seccomp_filter has a @usage refcount_t )
> 
> +1, that would indeed be worth to pursue as a short term solution.

I'm not sure how that can work. seccomp's prctl accepts a list of insns.
There is no handle.
kernel can keep a hashtable of all progs ever loaded and do a search
in it before loading another one, but that's an ugly hack.
Another alternative is to attach seccomp prog to parent task
instead of N childrens.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about seccomp / bpf
  2019-05-09 23:30             ` Alexei Starovoitov
@ 2019-05-09 23:49               ` Kees Cook
  2019-05-09 23:50               ` Eric Dumazet
  1 sibling, 0 replies; 10+ messages in thread
From: Kees Cook @ 2019-05-09 23:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Eric Dumazet, Alexei Starovoitov, netdev, bpf,
	Andy Lutomirski, Jann Horn, Will Drewry

On Thu, May 9, 2019 at 4:30 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> I'm not sure how that can work. seccomp's prctl accepts a list of insns.
> There is no handle.
> kernel can keep a hashtable of all progs ever loaded and do a search
> in it before loading another one, but that's an ugly hack.
> Another alternative is to attach seccomp prog to parent task
> instead of N childrens.

seccomp's filter is already shared by all the children of whatever
process got the filter attached.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about seccomp / bpf
  2019-05-09 23:30             ` Alexei Starovoitov
  2019-05-09 23:49               ` Kees Cook
@ 2019-05-09 23:50               ` Eric Dumazet
  2019-05-09 23:51                 ` Alexei Starovoitov
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2019-05-09 23:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Alexei Starovoitov, netdev, bpf, Kees Cook,
	Andy Lutomirski, Jann Horn, Will Drewry

On Thu, May 9, 2019 at 4:30 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, May 09, 2019 at 01:49:25PM +0200, Daniel Borkmann wrote:
> > On 05/09/2019 12:58 PM, Eric Dumazet wrote:
> > > On Thu, May 9, 2019 at 3:52 AM Eric Dumazet <edumazet@google.com> wrote:
> > >> On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov
> > >> <alexei.starovoitov@gmail.com> wrote:
> > >>> On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote:
> > >>>> On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov
> > >>>> <alexei.starovoitov@gmail.com> wrote:
> > >>>>> On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote:
> > >>>>>> Hi Alexei and Daniel
> > >>>>>>
> > >>>>>> I have a question about seccomp.
> > >>>>>>
> > >>>>>> It seems that after this patch, seccomp no longer needs a helper
> > >>>>>> (seccomp_bpf_load())
> > >>>>>>
> > >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8
> > >>>>>>
> > >>>>>> Are we detecting that a particular JIT code needs to call at least one
> > >>>>>> function from the kernel at all ?
> > >>>>>
> > >>>>> Currently we don't track such things and trying very hard to avoid
> > >>>>> any special cases for classic vs extended.
> > >>>>>
> > >>>>>> If the filter contains self-contained code (no call, just inline
> > >>>>>> code), then we could use any room in whole vmalloc space,
> > >>>>>> not only from the modules (which is something like 2GB total on x86_64)
> > >>>>>
> > >>>>> I believe there was an effort to make bpf progs and other executable things
> > >>>>> to be everywhere too, but I lost the track of it.
> > >>>>> It's not that hard to tweak x64 jit to emit 64-bit calls to helpers
> > >>>>> when delta between call insn and a helper is more than 32-bit that fits
> > >>>>> into call insn. iirc there was even such patch floating around.
> > >>>>>
> > >>>>> but what motivated you question? do you see 2GB space being full?!
> > >>>>
> > >>>> A customer seems to hit the limit, with about 75,000 threads,
> > >>>> each one having a seccomp filter with 6 pages (plus one guard page
> > >>>> given by vmalloc)
> > >>>
> > >>> Since cbpf doesn't have "fd as a program" concept I suspect
> > >>> the same program was loaded 75k times. What a waste of kernel memory.
> > >>> And, no, we're not going to extend or fix cbpf for this.
> > >>> cbpf is frozen. seccomp needs to start using ebpf.
> > >>> It can have one program to secure all threads.
> > >>> If necessary single program can be customized via bpf maps
> > >>> for each thread.
> > >>
> > >> Yes,  docker seems to have a very generic implementation and  should
> > >> probably be fixed
> > >> ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go )
> > >
> > > Even if the seccomp program was optimized to a few bytes, it would
> > > still consume at least 2 pages in module vmalloc space,
> > > so the limit in number of concurrent programs would be around 262,144
> > >
> > > We might ask seccomp guys to detect that the same program is used, by
> > > maintaining a hash of already loaded ones.
> > > ( I see struct seccomp_filter has a @usage refcount_t )
> >
> > +1, that would indeed be worth to pursue as a short term solution.
>
> I'm not sure how that can work. seccomp's prctl accepts a list of insns.
> There is no handle.
> kernel can keep a hashtable of all progs ever loaded and do a search
> in it before loading another one, but that's an ugly hack.

I guess that if such a hack is doable and can save 2GB of memory, then
it is an acceptable one.


> Another alternative is to attach seccomp prog to parent task
> instead of N childrens.

seccomp filters are stacked, the parent(s) filter(s) might be very different.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about seccomp / bpf
  2019-05-09 23:50               ` Eric Dumazet
@ 2019-05-09 23:51                 ` Alexei Starovoitov
  0 siblings, 0 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2019-05-09 23:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Borkmann, Alexei Starovoitov, netdev, bpf, Kees Cook,
	Andy Lutomirski, Jann Horn, Will Drewry

On Thu, May 09, 2019 at 04:50:12PM -0700, Eric Dumazet wrote:
> On Thu, May 9, 2019 at 4:30 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, May 09, 2019 at 01:49:25PM +0200, Daniel Borkmann wrote:
> > > On 05/09/2019 12:58 PM, Eric Dumazet wrote:
> > > > On Thu, May 9, 2019 at 3:52 AM Eric Dumazet <edumazet@google.com> wrote:
> > > >> On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov
> > > >> <alexei.starovoitov@gmail.com> wrote:
> > > >>> On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote:
> > > >>>> On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov
> > > >>>> <alexei.starovoitov@gmail.com> wrote:
> > > >>>>> On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote:
> > > >>>>>> Hi Alexei and Daniel
> > > >>>>>>
> > > >>>>>> I have a question about seccomp.
> > > >>>>>>
> > > >>>>>> It seems that after this patch, seccomp no longer needs a helper
> > > >>>>>> (seccomp_bpf_load())
> > > >>>>>>
> > > >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8
> > > >>>>>>
> > > >>>>>> Are we detecting that a particular JIT code needs to call at least one
> > > >>>>>> function from the kernel at all ?
> > > >>>>>
> > > >>>>> Currently we don't track such things and trying very hard to avoid
> > > >>>>> any special cases for classic vs extended.
> > > >>>>>
> > > >>>>>> If the filter contains self-contained code (no call, just inline
> > > >>>>>> code), then we could use any room in whole vmalloc space,
> > > >>>>>> not only from the modules (which is something like 2GB total on x86_64)
> > > >>>>>
> > > >>>>> I believe there was an effort to make bpf progs and other executable things
> > > >>>>> to be everywhere too, but I lost the track of it.
> > > >>>>> It's not that hard to tweak x64 jit to emit 64-bit calls to helpers
> > > >>>>> when delta between call insn and a helper is more than 32-bit that fits
> > > >>>>> into call insn. iirc there was even such patch floating around.
> > > >>>>>
> > > >>>>> but what motivated you question? do you see 2GB space being full?!
> > > >>>>
> > > >>>> A customer seems to hit the limit, with about 75,000 threads,
> > > >>>> each one having a seccomp filter with 6 pages (plus one guard page
> > > >>>> given by vmalloc)
> > > >>>
> > > >>> Since cbpf doesn't have "fd as a program" concept I suspect
> > > >>> the same program was loaded 75k times. What a waste of kernel memory.
> > > >>> And, no, we're not going to extend or fix cbpf for this.
> > > >>> cbpf is frozen. seccomp needs to start using ebpf.
> > > >>> It can have one program to secure all threads.
> > > >>> If necessary single program can be customized via bpf maps
> > > >>> for each thread.
> > > >>
> > > >> Yes,  docker seems to have a very generic implementation and  should
> > > >> probably be fixed
> > > >> ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go )
> > > >
> > > > Even if the seccomp program was optimized to a few bytes, it would
> > > > still consume at least 2 pages in module vmalloc space,
> > > > so the limit in number of concurrent programs would be around 262,144
> > > >
> > > > We might ask seccomp guys to detect that the same program is used, by
> > > > maintaining a hash of already loaded ones.
> > > > ( I see struct seccomp_filter has a @usage refcount_t )
> > >
> > > +1, that would indeed be worth to pursue as a short term solution.
> >
> > I'm not sure how that can work. seccomp's prctl accepts a list of insns.
> > There is no handle.
> > kernel can keep a hashtable of all progs ever loaded and do a search
> > in it before loading another one, but that's an ugly hack.
> 
> I guess that if such a hack is doable and can save 2GB of memory, then
> it is an acceptable one.

sounds that user space can and should be fixed first.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-05-09 23:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CANn89iL_XLb5C-+DY5PRhneZDJv585xfbLtiEVc3-ejzNNXaVg@mail.gmail.com>
2019-05-08 23:09 ` Question about seccomp / bpf Alexei Starovoitov
2019-05-08 23:17   ` Eric Dumazet
2019-05-09  4:47     ` Alexei Starovoitov
2019-05-09 10:52       ` Eric Dumazet
2019-05-09 10:58         ` Eric Dumazet
2019-05-09 11:49           ` Daniel Borkmann
2019-05-09 23:30             ` Alexei Starovoitov
2019-05-09 23:49               ` Kees Cook
2019-05-09 23:50               ` Eric Dumazet
2019-05-09 23:51                 ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).