* Re: Question about seccomp / bpf [not found] <CANn89iL_XLb5C-+DY5PRhneZDJv585xfbLtiEVc3-ejzNNXaVg@mail.gmail.com> @ 2019-05-08 23:09 ` Alexei Starovoitov 2019-05-08 23:17 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Alexei Starovoitov @ 2019-05-08 23:09 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote: > Hi Alexei and Daniel > > I have a question about seccomp. > > It seems that after this patch, seccomp no longer needs a helper > (seccomp_bpf_load()) > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8 > > Are we detecting that a particular JIT code needs to call at least one > function from the kernel at all ? Currently we don't track such things and trying very hard to avoid any special cases for classic vs extended. > If the filter contains self-contained code (no call, just inline > code), then we could use any room in whole vmalloc space, > not only from the modules (which is something like 2GB total on x86_64) I believe there was an effort to make bpf progs and other executable things to be everywhere too, but I lost the track of it. It's not that hard to tweak x64 jit to emit 64-bit calls to helpers when delta between call insn and a helper is more than 32-bit that fits into call insn. iirc there was even such patch floating around. but what motivated you question? do you see 2GB space being full?! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about seccomp / bpf 2019-05-08 23:09 ` Question about seccomp / bpf Alexei Starovoitov @ 2019-05-08 23:17 ` Eric Dumazet 2019-05-09 4:47 ` Alexei Starovoitov 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2019-05-08 23:17 UTC (permalink / raw) To: Alexei Starovoitov Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf, Kees Cook On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote: > > Hi Alexei and Daniel > > > > I have a question about seccomp. > > > > It seems that after this patch, seccomp no longer needs a helper > > (seccomp_bpf_load()) > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8 > > > > Are we detecting that a particular JIT code needs to call at least one > > function from the kernel at all ? > > Currently we don't track such things and trying very hard to avoid > any special cases for classic vs extended. > > > If the filter contains self-contained code (no call, just inline > > code), then we could use any room in whole vmalloc space, > > not only from the modules (which is something like 2GB total on x86_64) > > I believe there was an effort to make bpf progs and other executable things > to be everywhere too, but I lost the track of it. > It's not that hard to tweak x64 jit to emit 64-bit calls to helpers > when delta between call insn and a helper is more than 32-bit that fits > into call insn. iirc there was even such patch floating around. > > but what motivated you question? do you see 2GB space being full?! A customer seems to hit the limit, with about 75,000 threads, each one having a seccomp filter with 6 pages (plus one guard page given by vmalloc) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about seccomp / bpf 2019-05-08 23:17 ` Eric Dumazet @ 2019-05-09 4:47 ` Alexei Starovoitov 2019-05-09 10:52 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Alexei Starovoitov @ 2019-05-09 4:47 UTC (permalink / raw) To: Eric Dumazet Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf, Kees Cook, luto, jannh On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote: > On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote: > > > Hi Alexei and Daniel > > > > > > I have a question about seccomp. > > > > > > It seems that after this patch, seccomp no longer needs a helper > > > (seccomp_bpf_load()) > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8 > > > > > > Are we detecting that a particular JIT code needs to call at least one > > > function from the kernel at all ? > > > > Currently we don't track such things and trying very hard to avoid > > any special cases for classic vs extended. > > > > > If the filter contains self-contained code (no call, just inline > > > code), then we could use any room in whole vmalloc space, > > > not only from the modules (which is something like 2GB total on x86_64) > > > > I believe there was an effort to make bpf progs and other executable things > > to be everywhere too, but I lost the track of it. > > It's not that hard to tweak x64 jit to emit 64-bit calls to helpers > > when delta between call insn and a helper is more than 32-bit that fits > > into call insn. iirc there was even such patch floating around. > > > > but what motivated you question? do you see 2GB space being full?! > > > A customer seems to hit the limit, with about 75,000 threads, > each one having a seccomp filter with 6 pages (plus one guard page > given by vmalloc) Since cbpf doesn't have "fd as a program" concept I suspect the same program was loaded 75k times. What a waste of kernel memory. And, no, we're not going to extend or fix cbpf for this. cbpf is frozen. seccomp needs to start using ebpf. It can have one program to secure all threads. If necessary single program can be customized via bpf maps for each thread. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about seccomp / bpf 2019-05-09 4:47 ` Alexei Starovoitov @ 2019-05-09 10:52 ` Eric Dumazet 2019-05-09 10:58 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2019-05-09 10:52 UTC (permalink / raw) To: Alexei Starovoitov Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf, Kees Cook, Andy Lutomirski, Jann Horn On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote: > > On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote: > > > > Hi Alexei and Daniel > > > > > > > > I have a question about seccomp. > > > > > > > > It seems that after this patch, seccomp no longer needs a helper > > > > (seccomp_bpf_load()) > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8 > > > > > > > > Are we detecting that a particular JIT code needs to call at least one > > > > function from the kernel at all ? > > > > > > Currently we don't track such things and trying very hard to avoid > > > any special cases for classic vs extended. > > > > > > > If the filter contains self-contained code (no call, just inline > > > > code), then we could use any room in whole vmalloc space, > > > > not only from the modules (which is something like 2GB total on x86_64) > > > > > > I believe there was an effort to make bpf progs and other executable things > > > to be everywhere too, but I lost the track of it. > > > It's not that hard to tweak x64 jit to emit 64-bit calls to helpers > > > when delta between call insn and a helper is more than 32-bit that fits > > > into call insn. iirc there was even such patch floating around. > > > > > > but what motivated you question? do you see 2GB space being full?! > > > > > > A customer seems to hit the limit, with about 75,000 threads, > > each one having a seccomp filter with 6 pages (plus one guard page > > given by vmalloc) > > Since cbpf doesn't have "fd as a program" concept I suspect > the same program was loaded 75k times. What a waste of kernel memory. > And, no, we're not going to extend or fix cbpf for this. > cbpf is frozen. seccomp needs to start using ebpf. > It can have one program to secure all threads. > If necessary single program can be customized via bpf maps > for each thread. Yes, docker seems to have a very generic implementation and should probably be fixed ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go ) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about seccomp / bpf 2019-05-09 10:52 ` Eric Dumazet @ 2019-05-09 10:58 ` Eric Dumazet 2019-05-09 11:49 ` Daniel Borkmann 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2019-05-09 10:58 UTC (permalink / raw) To: Alexei Starovoitov Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf, Kees Cook, Andy Lutomirski, Jann Horn, Will Drewry On Thu, May 9, 2019 at 3:52 AM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote: > > > On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote: > > > > > Hi Alexei and Daniel > > > > > > > > > > I have a question about seccomp. > > > > > > > > > > It seems that after this patch, seccomp no longer needs a helper > > > > > (seccomp_bpf_load()) > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8 > > > > > > > > > > Are we detecting that a particular JIT code needs to call at least one > > > > > function from the kernel at all ? > > > > > > > > Currently we don't track such things and trying very hard to avoid > > > > any special cases for classic vs extended. > > > > > > > > > If the filter contains self-contained code (no call, just inline > > > > > code), then we could use any room in whole vmalloc space, > > > > > not only from the modules (which is something like 2GB total on x86_64) > > > > > > > > I believe there was an effort to make bpf progs and other executable things > > > > to be everywhere too, but I lost the track of it. > > > > It's not that hard to tweak x64 jit to emit 64-bit calls to helpers > > > > when delta between call insn and a helper is more than 32-bit that fits > > > > into call insn. iirc there was even such patch floating around. > > > > > > > > but what motivated you question? do you see 2GB space being full?! > > > > > > > > > A customer seems to hit the limit, with about 75,000 threads, > > > each one having a seccomp filter with 6 pages (plus one guard page > > > given by vmalloc) > > > > Since cbpf doesn't have "fd as a program" concept I suspect > > the same program was loaded 75k times. What a waste of kernel memory. > > And, no, we're not going to extend or fix cbpf for this. > > cbpf is frozen. seccomp needs to start using ebpf. > > It can have one program to secure all threads. > > If necessary single program can be customized via bpf maps > > for each thread. > > Yes, docker seems to have a very generic implementation and should > probably be fixed > ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go ) Even if the seccomp program was optimized to a few bytes, it would still consume at least 2 pages in module vmalloc space, so the limit in number of concurrent programs would be around 262,144 We might ask seccomp guys to detect that the same program is used, by maintaining a hash of already loaded ones. ( I see struct seccomp_filter has a @usage refcount_t ) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about seccomp / bpf 2019-05-09 10:58 ` Eric Dumazet @ 2019-05-09 11:49 ` Daniel Borkmann 2019-05-09 23:30 ` Alexei Starovoitov 0 siblings, 1 reply; 10+ messages in thread From: Daniel Borkmann @ 2019-05-09 11:49 UTC (permalink / raw) To: Eric Dumazet, Alexei Starovoitov Cc: Alexei Starovoitov, netdev, bpf, Kees Cook, Andy Lutomirski, Jann Horn, Will Drewry On 05/09/2019 12:58 PM, Eric Dumazet wrote: > On Thu, May 9, 2019 at 3:52 AM Eric Dumazet <edumazet@google.com> wrote: >> On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov >> <alexei.starovoitov@gmail.com> wrote: >>> On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote: >>>> On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov >>>> <alexei.starovoitov@gmail.com> wrote: >>>>> On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote: >>>>>> Hi Alexei and Daniel >>>>>> >>>>>> I have a question about seccomp. >>>>>> >>>>>> It seems that after this patch, seccomp no longer needs a helper >>>>>> (seccomp_bpf_load()) >>>>>> >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8 >>>>>> >>>>>> Are we detecting that a particular JIT code needs to call at least one >>>>>> function from the kernel at all ? >>>>> >>>>> Currently we don't track such things and trying very hard to avoid >>>>> any special cases for classic vs extended. >>>>> >>>>>> If the filter contains self-contained code (no call, just inline >>>>>> code), then we could use any room in whole vmalloc space, >>>>>> not only from the modules (which is something like 2GB total on x86_64) >>>>> >>>>> I believe there was an effort to make bpf progs and other executable things >>>>> to be everywhere too, but I lost the track of it. >>>>> It's not that hard to tweak x64 jit to emit 64-bit calls to helpers >>>>> when delta between call insn and a helper is more than 32-bit that fits >>>>> into call insn. iirc there was even such patch floating around. >>>>> >>>>> but what motivated you question? do you see 2GB space being full?! >>>> >>>> A customer seems to hit the limit, with about 75,000 threads, >>>> each one having a seccomp filter with 6 pages (plus one guard page >>>> given by vmalloc) >>> >>> Since cbpf doesn't have "fd as a program" concept I suspect >>> the same program was loaded 75k times. What a waste of kernel memory. >>> And, no, we're not going to extend or fix cbpf for this. >>> cbpf is frozen. seccomp needs to start using ebpf. >>> It can have one program to secure all threads. >>> If necessary single program can be customized via bpf maps >>> for each thread. >> >> Yes, docker seems to have a very generic implementation and should >> probably be fixed >> ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go ) > > Even if the seccomp program was optimized to a few bytes, it would > still consume at least 2 pages in module vmalloc space, > so the limit in number of concurrent programs would be around 262,144 > > We might ask seccomp guys to detect that the same program is used, by > maintaining a hash of already loaded ones. > ( I see struct seccomp_filter has a @usage refcount_t ) +1, that would indeed be worth to pursue as a short term solution. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about seccomp / bpf 2019-05-09 11:49 ` Daniel Borkmann @ 2019-05-09 23:30 ` Alexei Starovoitov 2019-05-09 23:49 ` Kees Cook 2019-05-09 23:50 ` Eric Dumazet 0 siblings, 2 replies; 10+ messages in thread From: Alexei Starovoitov @ 2019-05-09 23:30 UTC (permalink / raw) To: Daniel Borkmann Cc: Eric Dumazet, Alexei Starovoitov, netdev, bpf, Kees Cook, Andy Lutomirski, Jann Horn, Will Drewry On Thu, May 09, 2019 at 01:49:25PM +0200, Daniel Borkmann wrote: > On 05/09/2019 12:58 PM, Eric Dumazet wrote: > > On Thu, May 9, 2019 at 3:52 AM Eric Dumazet <edumazet@google.com> wrote: > >> On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov > >> <alexei.starovoitov@gmail.com> wrote: > >>> On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote: > >>>> On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov > >>>> <alexei.starovoitov@gmail.com> wrote: > >>>>> On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote: > >>>>>> Hi Alexei and Daniel > >>>>>> > >>>>>> I have a question about seccomp. > >>>>>> > >>>>>> It seems that after this patch, seccomp no longer needs a helper > >>>>>> (seccomp_bpf_load()) > >>>>>> > >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8 > >>>>>> > >>>>>> Are we detecting that a particular JIT code needs to call at least one > >>>>>> function from the kernel at all ? > >>>>> > >>>>> Currently we don't track such things and trying very hard to avoid > >>>>> any special cases for classic vs extended. > >>>>> > >>>>>> If the filter contains self-contained code (no call, just inline > >>>>>> code), then we could use any room in whole vmalloc space, > >>>>>> not only from the modules (which is something like 2GB total on x86_64) > >>>>> > >>>>> I believe there was an effort to make bpf progs and other executable things > >>>>> to be everywhere too, but I lost the track of it. > >>>>> It's not that hard to tweak x64 jit to emit 64-bit calls to helpers > >>>>> when delta between call insn and a helper is more than 32-bit that fits > >>>>> into call insn. iirc there was even such patch floating around. > >>>>> > >>>>> but what motivated you question? do you see 2GB space being full?! > >>>> > >>>> A customer seems to hit the limit, with about 75,000 threads, > >>>> each one having a seccomp filter with 6 pages (plus one guard page > >>>> given by vmalloc) > >>> > >>> Since cbpf doesn't have "fd as a program" concept I suspect > >>> the same program was loaded 75k times. What a waste of kernel memory. > >>> And, no, we're not going to extend or fix cbpf for this. > >>> cbpf is frozen. seccomp needs to start using ebpf. > >>> It can have one program to secure all threads. > >>> If necessary single program can be customized via bpf maps > >>> for each thread. > >> > >> Yes, docker seems to have a very generic implementation and should > >> probably be fixed > >> ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go ) > > > > Even if the seccomp program was optimized to a few bytes, it would > > still consume at least 2 pages in module vmalloc space, > > so the limit in number of concurrent programs would be around 262,144 > > > > We might ask seccomp guys to detect that the same program is used, by > > maintaining a hash of already loaded ones. > > ( I see struct seccomp_filter has a @usage refcount_t ) > > +1, that would indeed be worth to pursue as a short term solution. I'm not sure how that can work. seccomp's prctl accepts a list of insns. There is no handle. kernel can keep a hashtable of all progs ever loaded and do a search in it before loading another one, but that's an ugly hack. Another alternative is to attach seccomp prog to parent task instead of N childrens. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about seccomp / bpf 2019-05-09 23:30 ` Alexei Starovoitov @ 2019-05-09 23:49 ` Kees Cook 2019-05-09 23:50 ` Eric Dumazet 1 sibling, 0 replies; 10+ messages in thread From: Kees Cook @ 2019-05-09 23:49 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Borkmann, Eric Dumazet, Alexei Starovoitov, netdev, bpf, Andy Lutomirski, Jann Horn, Will Drewry On Thu, May 9, 2019 at 4:30 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > I'm not sure how that can work. seccomp's prctl accepts a list of insns. > There is no handle. > kernel can keep a hashtable of all progs ever loaded and do a search > in it before loading another one, but that's an ugly hack. > Another alternative is to attach seccomp prog to parent task > instead of N childrens. seccomp's filter is already shared by all the children of whatever process got the filter attached. -- Kees Cook ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about seccomp / bpf 2019-05-09 23:30 ` Alexei Starovoitov 2019-05-09 23:49 ` Kees Cook @ 2019-05-09 23:50 ` Eric Dumazet 2019-05-09 23:51 ` Alexei Starovoitov 1 sibling, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2019-05-09 23:50 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Borkmann, Alexei Starovoitov, netdev, bpf, Kees Cook, Andy Lutomirski, Jann Horn, Will Drewry On Thu, May 9, 2019 at 4:30 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, May 09, 2019 at 01:49:25PM +0200, Daniel Borkmann wrote: > > On 05/09/2019 12:58 PM, Eric Dumazet wrote: > > > On Thu, May 9, 2019 at 3:52 AM Eric Dumazet <edumazet@google.com> wrote: > > >> On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov > > >> <alexei.starovoitov@gmail.com> wrote: > > >>> On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote: > > >>>> On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov > > >>>> <alexei.starovoitov@gmail.com> wrote: > > >>>>> On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote: > > >>>>>> Hi Alexei and Daniel > > >>>>>> > > >>>>>> I have a question about seccomp. > > >>>>>> > > >>>>>> It seems that after this patch, seccomp no longer needs a helper > > >>>>>> (seccomp_bpf_load()) > > >>>>>> > > >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8 > > >>>>>> > > >>>>>> Are we detecting that a particular JIT code needs to call at least one > > >>>>>> function from the kernel at all ? > > >>>>> > > >>>>> Currently we don't track such things and trying very hard to avoid > > >>>>> any special cases for classic vs extended. > > >>>>> > > >>>>>> If the filter contains self-contained code (no call, just inline > > >>>>>> code), then we could use any room in whole vmalloc space, > > >>>>>> not only from the modules (which is something like 2GB total on x86_64) > > >>>>> > > >>>>> I believe there was an effort to make bpf progs and other executable things > > >>>>> to be everywhere too, but I lost the track of it. > > >>>>> It's not that hard to tweak x64 jit to emit 64-bit calls to helpers > > >>>>> when delta between call insn and a helper is more than 32-bit that fits > > >>>>> into call insn. iirc there was even such patch floating around. > > >>>>> > > >>>>> but what motivated you question? do you see 2GB space being full?! > > >>>> > > >>>> A customer seems to hit the limit, with about 75,000 threads, > > >>>> each one having a seccomp filter with 6 pages (plus one guard page > > >>>> given by vmalloc) > > >>> > > >>> Since cbpf doesn't have "fd as a program" concept I suspect > > >>> the same program was loaded 75k times. What a waste of kernel memory. > > >>> And, no, we're not going to extend or fix cbpf for this. > > >>> cbpf is frozen. seccomp needs to start using ebpf. > > >>> It can have one program to secure all threads. > > >>> If necessary single program can be customized via bpf maps > > >>> for each thread. > > >> > > >> Yes, docker seems to have a very generic implementation and should > > >> probably be fixed > > >> ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go ) > > > > > > Even if the seccomp program was optimized to a few bytes, it would > > > still consume at least 2 pages in module vmalloc space, > > > so the limit in number of concurrent programs would be around 262,144 > > > > > > We might ask seccomp guys to detect that the same program is used, by > > > maintaining a hash of already loaded ones. > > > ( I see struct seccomp_filter has a @usage refcount_t ) > > > > +1, that would indeed be worth to pursue as a short term solution. > > I'm not sure how that can work. seccomp's prctl accepts a list of insns. > There is no handle. > kernel can keep a hashtable of all progs ever loaded and do a search > in it before loading another one, but that's an ugly hack. I guess that if such a hack is doable and can save 2GB of memory, then it is an acceptable one. > Another alternative is to attach seccomp prog to parent task > instead of N childrens. seccomp filters are stacked, the parent(s) filter(s) might be very different. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about seccomp / bpf 2019-05-09 23:50 ` Eric Dumazet @ 2019-05-09 23:51 ` Alexei Starovoitov 0 siblings, 0 replies; 10+ messages in thread From: Alexei Starovoitov @ 2019-05-09 23:51 UTC (permalink / raw) To: Eric Dumazet Cc: Daniel Borkmann, Alexei Starovoitov, netdev, bpf, Kees Cook, Andy Lutomirski, Jann Horn, Will Drewry On Thu, May 09, 2019 at 04:50:12PM -0700, Eric Dumazet wrote: > On Thu, May 9, 2019 at 4:30 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, May 09, 2019 at 01:49:25PM +0200, Daniel Borkmann wrote: > > > On 05/09/2019 12:58 PM, Eric Dumazet wrote: > > > > On Thu, May 9, 2019 at 3:52 AM Eric Dumazet <edumazet@google.com> wrote: > > > >> On Wed, May 8, 2019 at 9:47 PM Alexei Starovoitov > > > >> <alexei.starovoitov@gmail.com> wrote: > > > >>> On Wed, May 08, 2019 at 04:17:29PM -0700, Eric Dumazet wrote: > > > >>>> On Wed, May 8, 2019 at 4:09 PM Alexei Starovoitov > > > >>>> <alexei.starovoitov@gmail.com> wrote: > > > >>>>> On Wed, May 08, 2019 at 02:21:52PM -0700, Eric Dumazet wrote: > > > >>>>>> Hi Alexei and Daniel > > > >>>>>> > > > >>>>>> I have a question about seccomp. > > > >>>>>> > > > >>>>>> It seems that after this patch, seccomp no longer needs a helper > > > >>>>>> (seccomp_bpf_load()) > > > >>>>>> > > > >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8 > > > >>>>>> > > > >>>>>> Are we detecting that a particular JIT code needs to call at least one > > > >>>>>> function from the kernel at all ? > > > >>>>> > > > >>>>> Currently we don't track such things and trying very hard to avoid > > > >>>>> any special cases for classic vs extended. > > > >>>>> > > > >>>>>> If the filter contains self-contained code (no call, just inline > > > >>>>>> code), then we could use any room in whole vmalloc space, > > > >>>>>> not only from the modules (which is something like 2GB total on x86_64) > > > >>>>> > > > >>>>> I believe there was an effort to make bpf progs and other executable things > > > >>>>> to be everywhere too, but I lost the track of it. > > > >>>>> It's not that hard to tweak x64 jit to emit 64-bit calls to helpers > > > >>>>> when delta between call insn and a helper is more than 32-bit that fits > > > >>>>> into call insn. iirc there was even such patch floating around. > > > >>>>> > > > >>>>> but what motivated you question? do you see 2GB space being full?! > > > >>>> > > > >>>> A customer seems to hit the limit, with about 75,000 threads, > > > >>>> each one having a seccomp filter with 6 pages (plus one guard page > > > >>>> given by vmalloc) > > > >>> > > > >>> Since cbpf doesn't have "fd as a program" concept I suspect > > > >>> the same program was loaded 75k times. What a waste of kernel memory. > > > >>> And, no, we're not going to extend or fix cbpf for this. > > > >>> cbpf is frozen. seccomp needs to start using ebpf. > > > >>> It can have one program to secure all threads. > > > >>> If necessary single program can be customized via bpf maps > > > >>> for each thread. > > > >> > > > >> Yes, docker seems to have a very generic implementation and should > > > >> probably be fixed > > > >> ( https://github.com/moby/moby/blob/v17.03.2-ce/profiles/seccomp/seccomp.go ) > > > > > > > > Even if the seccomp program was optimized to a few bytes, it would > > > > still consume at least 2 pages in module vmalloc space, > > > > so the limit in number of concurrent programs would be around 262,144 > > > > > > > > We might ask seccomp guys to detect that the same program is used, by > > > > maintaining a hash of already loaded ones. > > > > ( I see struct seccomp_filter has a @usage refcount_t ) > > > > > > +1, that would indeed be worth to pursue as a short term solution. > > > > I'm not sure how that can work. seccomp's prctl accepts a list of insns. > > There is no handle. > > kernel can keep a hashtable of all progs ever loaded and do a search > > in it before loading another one, but that's an ugly hack. > > I guess that if such a hack is doable and can save 2GB of memory, then > it is an acceptable one. sounds that user space can and should be fixed first. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-05-09 23:52 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CANn89iL_XLb5C-+DY5PRhneZDJv585xfbLtiEVc3-ejzNNXaVg@mail.gmail.com> 2019-05-08 23:09 ` Question about seccomp / bpf Alexei Starovoitov 2019-05-08 23:17 ` Eric Dumazet 2019-05-09 4:47 ` Alexei Starovoitov 2019-05-09 10:52 ` Eric Dumazet 2019-05-09 10:58 ` Eric Dumazet 2019-05-09 11:49 ` Daniel Borkmann 2019-05-09 23:30 ` Alexei Starovoitov 2019-05-09 23:49 ` Kees Cook 2019-05-09 23:50 ` Eric Dumazet 2019-05-09 23:51 ` Alexei Starovoitov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).