bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bpf_jit_limit close shave
@ 2021-09-21 11:49 Lorenz Bauer
  2021-09-21 14:34 ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenz Bauer @ 2021-09-21 11:49 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann; +Cc: kernel-team

Hi,

We just had a close shave with bpf_jit_limit. Something on our edge
caused us to cross the default limit, which made seccomp and xt_bpf
filters fail to load. Looking at the source made me realise that we
narrowly avoided taking out our load balancer, which would've been
pretty bad. We still run the LB with CAP_SYS_ADMIN instead of narrower
CAP_BPF, CAP_NET_ADMIN. If we had migrated to the lesser capability
set we would've been prevented from loading new eBPF:

int bpf_jit_charge_modmem(u32 pages)
{
    if (atomic_long_add_return(pages, &bpf_jit_current) >
        (bpf_jit_limit >> PAGE_SHIFT)) {
        if (!capable(CAP_SYS_ADMIN)) {
            atomic_long_sub(pages, &bpf_jit_current);
            return -EPERM;
        }
    }

    return 0;
}

Does it make sense to include !capable(CAP_BPF) in the check?

This limit reminds me a bit of the memlock issue, where a global limit
causes coupling between independent systems / processes. Can we remove
the limit in favour of something more fine grained?

Best
Lorenz

--
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf_jit_limit close shave
  2021-09-21 11:49 bpf_jit_limit close shave Lorenz Bauer
@ 2021-09-21 14:34 ` Alexei Starovoitov
  2021-09-21 15:52   ` Lorenz Bauer
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2021-09-21 14:34 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team

On Tue, Sep 21, 2021 at 4:50 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> Hi,
>
> We just had a close shave with bpf_jit_limit. Something on our edge
> caused us to cross the default limit, which made seccomp and xt_bpf
> filters fail to load. Looking at the source made me realise that we
> narrowly avoided taking out our load balancer, which would've been
> pretty bad. We still run the LB with CAP_SYS_ADMIN instead of narrower
> CAP_BPF, CAP_NET_ADMIN. If we had migrated to the lesser capability
> set we would've been prevented from loading new eBPF:
>
> int bpf_jit_charge_modmem(u32 pages)
> {
>     if (atomic_long_add_return(pages, &bpf_jit_current) >
>         (bpf_jit_limit >> PAGE_SHIFT)) {
>         if (!capable(CAP_SYS_ADMIN)) {
>             atomic_long_sub(pages, &bpf_jit_current);
>             return -EPERM;
>         }
>     }
>
>     return 0;
> }
>
> Does it make sense to include !capable(CAP_BPF) in the check?

Good point. Makes sense to add CAP_BPF there.
Taking down critical networking infrastructure because of this limit
that supposed to apply to unpriv users only is scary indeed.

> This limit reminds me a bit of the memlock issue, where a global limit
> causes coupling between independent systems / processes. Can we remove
> the limit in favour of something more fine grained?

Right. Unfortunately memcg doesn't distinguish kernel module
memory vs any other memory. All types of memory are memory.
Regardless of whether its type is per-cpu, bpf map memory, bpf jit memory, etc.
That's the main reason for the independent knob for JITed memory.
Since it's a bit special. It's a crude knob. Certainly not perfect.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf_jit_limit close shave
  2021-09-21 14:34 ` Alexei Starovoitov
@ 2021-09-21 15:52   ` Lorenz Bauer
       [not found]     ` <CABEBQi=WfdJ-h+5+fgFXOptDWSk2Oe_V85gR90G2V+PQh9ME0A@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenz Bauer @ 2021-09-21 15:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team

On Tue, 21 Sept 2021 at 15:34, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Sep 21, 2021 at 4:50 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
> >
> > Does it make sense to include !capable(CAP_BPF) in the check?
>
> Good point. Makes sense to add CAP_BPF there.
> Taking down critical networking infrastructure because of this limit
> that supposed to apply to unpriv users only is scary indeed.

Ok, I'll send a patch. Can I add a Fixes: 2c78ee898d8f ("bpf:
Implement CAP_BPF")?

Another thought: move the check for bpf_capable before the
atomic_long_add_return? This means we only track JIT allocations from
unprivileged users. As it stands a privileged user can easily "lock
out" unprivileged users, which on our set up is a real concern. We
have several socket filters / SO_REUSEPORT programs which are
critical, and also use lots of XDP from privileged processes as you
know.

>
> > This limit reminds me a bit of the memlock issue, where a global limit
> > causes coupling between independent systems / processes. Can we remove
> > the limit in favour of something more fine grained?
>
> Right. Unfortunately memcg doesn't distinguish kernel module
> memory vs any other memory. All types of memory are memory.
> Regardless of whether its type is per-cpu, bpf map memory, bpf jit memory, etc.
> That's the main reason for the independent knob for JITed memory.
> Since it's a bit special. It's a crude knob. Certainly not perfect.

I'm missing context, how is JIT memory different from these other kinds of code?

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf_jit_limit close shave
       [not found]     ` <CABEBQi=WfdJ-h+5+fgFXOptDWSk2Oe_V85gR90G2V+PQh9ME0A@mail.gmail.com>
@ 2021-09-21 19:59       ` Alexei Starovoitov
  2021-09-22  8:20         ` Frank Hofmann
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2021-09-21 19:59 UTC (permalink / raw)
  To: Frank Hofmann
  Cc: Lorenz Bauer, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team

On Tue, Sep 21, 2021 at 12:11 PM Frank Hofmann <fhofmann@cloudflare.com> wrote:
>
> Wouldn't that (updating the variable only for unpriv use) also make the leak impossible to notice that we ran into ?

impossible?
That jit limit is not there on older kernels and doesn't apply to root.
How would you notice such a kernel bug in such conditions?

> (we have something near to a simple reproducer for https://www.spinics.net/lists/kernel/msg4029472.html ... need to extract the relevant parts of an app of ours, will update separately when there)
>
> FrankH.
>
> On Tue, Sep 21, 2021 at 4:52 PM Lorenz Bauer <lmb@cloudflare.com> wrote:
>>
>> On Tue, 21 Sept 2021 at 15:34, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> >
>> > On Tue, Sep 21, 2021 at 4:50 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>> > >
>> > > Does it make sense to include !capable(CAP_BPF) in the check?
>> >
>> > Good point. Makes sense to add CAP_BPF there.
>> > Taking down critical networking infrastructure because of this limit
>> > that supposed to apply to unpriv users only is scary indeed.
>>
>> Ok, I'll send a patch. Can I add a Fixes: 2c78ee898d8f ("bpf:
>> Implement CAP_BPF")?
>>
>> Another thought: move the check for bpf_capable before the
>> atomic_long_add_return? This means we only track JIT allocations from
>> unprivileged users. As it stands a privileged user can easily "lock
>> out" unprivileged users, which on our set up is a real concern. We
>> have several socket filters / SO_REUSEPORT programs which are
>> critical, and also use lots of XDP from privileged processes as you
>> know.
>>
>> >
>> > > This limit reminds me a bit of the memlock issue, where a global limit
>> > > causes coupling between independent systems / processes. Can we remove
>> > > the limit in favour of something more fine grained?
>> >
>> > Right. Unfortunately memcg doesn't distinguish kernel module
>> > memory vs any other memory. All types of memory are memory.
>> > Regardless of whether its type is per-cpu, bpf map memory, bpf jit memory, etc.
>> > That's the main reason for the independent knob for JITed memory.
>> > Since it's a bit special. It's a crude knob. Certainly not perfect.
>>
>> I'm missing context, how is JIT memory different from these other kinds of code?
>>
>> Lorenz
>>
>> --
>> Lorenz Bauer  |  Systems Engineer
>> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>>
>> www.cloudflare.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf_jit_limit close shave
  2021-09-21 19:59       ` Alexei Starovoitov
@ 2021-09-22  8:20         ` Frank Hofmann
  2021-09-22 11:07           ` Lorenz Bauer
  0 siblings, 1 reply; 11+ messages in thread
From: Frank Hofmann @ 2021-09-22  8:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Lorenz Bauer, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team

On Tue, Sep 21, 2021 at 8:59 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Sep 21, 2021 at 12:11 PM Frank Hofmann <fhofmann@cloudflare.com> wrote:
> >
> > Wouldn't that (updating the variable only for unpriv use) also make the leak impossible to notice that we ran into ?
>
> impossible?
> That jit limit is not there on older kernels and doesn't apply to root.
> How would you notice such a kernel bug in such conditions?

I'm talking about bpf_jit_current - it's an "overall gauge" for
allocation, priv and unpriv. I understood Lorenz' note as "change it
so it only tracks unpriv BPF mem usage - since we'll never act on
privileged usage anyway"

FrankH.

>
> > (we have something near to a simple reproducer for https://www.spinics.net/lists/kernel/msg4029472.html ... need to extract the relevant parts of an app of ours, will update separately when there)
> >
> > FrankH.
> >
> > On Tue, Sep 21, 2021 at 4:52 PM Lorenz Bauer <lmb@cloudflare.com> wrote:
> >>
> >> On Tue, 21 Sept 2021 at 15:34, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >> >
> >> > On Tue, Sep 21, 2021 at 4:50 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
> >> > >
> >> > > Does it make sense to include !capable(CAP_BPF) in the check?
> >> >
> >> > Good point. Makes sense to add CAP_BPF there.
> >> > Taking down critical networking infrastructure because of this limit
> >> > that supposed to apply to unpriv users only is scary indeed.
> >>
> >> Ok, I'll send a patch. Can I add a Fixes: 2c78ee898d8f ("bpf:
> >> Implement CAP_BPF")?
> >>
> >> Another thought: move the check for bpf_capable before the
> >> atomic_long_add_return? This means we only track JIT allocations from
> >> unprivileged users. As it stands a privileged user can easily "lock
> >> out" unprivileged users, which on our set up is a real concern. We
> >> have several socket filters / SO_REUSEPORT programs which are
> >> critical, and also use lots of XDP from privileged processes as you
> >> know.
> >>
> >> >
> >> > > This limit reminds me a bit of the memlock issue, where a global limit
> >> > > causes coupling between independent systems / processes. Can we remove
> >> > > the limit in favour of something more fine grained?
> >> >
> >> > Right. Unfortunately memcg doesn't distinguish kernel module
> >> > memory vs any other memory. All types of memory are memory.
> >> > Regardless of whether its type is per-cpu, bpf map memory, bpf jit memory, etc.
> >> > That's the main reason for the independent knob for JITed memory.
> >> > Since it's a bit special. It's a crude knob. Certainly not perfect.
> >>
> >> I'm missing context, how is JIT memory different from these other kinds of code?
> >>
> >> Lorenz
> >>
> >> --
> >> Lorenz Bauer  |  Systems Engineer
> >> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
> >>
> >> www.cloudflare.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf_jit_limit close shave
  2021-09-22  8:20         ` Frank Hofmann
@ 2021-09-22 11:07           ` Lorenz Bauer
  2021-09-22 21:51             ` Daniel Borkmann
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenz Bauer @ 2021-09-22 11:07 UTC (permalink / raw)
  To: Frank Hofmann
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team

On Wed, 22 Sept 2021 at 09:20, Frank Hofmann <fhofmann@cloudflare.com> wrote:
>
> > That jit limit is not there on older kernels and doesn't apply to root.
> > How would you notice such a kernel bug in such conditions?
>
> I'm talking about bpf_jit_current - it's an "overall gauge" for
> allocation, priv and unpriv. I understood Lorenz' note as "change it
> so it only tracks unpriv BPF mem usage - since we'll never act on
> privileged usage anyway"
>
> FrankH.

Yes, that was my suggestion indeed. What Frank is saying: it looks
like our leak of JIT memory is due to a privileged process. By
exempting privileged processes it would be even harder to notice /
debug. That's true, and brings me back to my question: what is
different about JIT memory that we can't do a better limit?

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf_jit_limit close shave
  2021-09-22 11:07           ` Lorenz Bauer
@ 2021-09-22 21:51             ` Daniel Borkmann
  2021-09-23  2:03               ` Alexei Starovoitov
  2021-09-23  9:16               ` Lorenz Bauer
  0 siblings, 2 replies; 11+ messages in thread
From: Daniel Borkmann @ 2021-09-22 21:51 UTC (permalink / raw)
  To: Lorenz Bauer, Frank Hofmann
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	kernel-team

On 9/22/21 1:07 PM, Lorenz Bauer wrote:
> On Wed, 22 Sept 2021 at 09:20, Frank Hofmann <fhofmann@cloudflare.com> wrote:
>>
>>> That jit limit is not there on older kernels and doesn't apply to root.
>>> How would you notice such a kernel bug in such conditions?
>>
>> I'm talking about bpf_jit_current - it's an "overall gauge" for
>> allocation, priv and unpriv. I understood Lorenz' note as "change it
>> so it only tracks unpriv BPF mem usage - since we'll never act on
>> privileged usage anyway"
> 
> Yes, that was my suggestion indeed. What Frank is saying: it looks
> like our leak of JIT memory is due to a privileged process. By
> exempting privileged processes it would be even harder to notice /
> debug. That's true, and brings me back to my question: what is
> different about JIT memory that we can't do a better limit?

The knob with the limit was basically added back then as a band-aid to avoid
unprivileged BPF JIT (cBPF or eBPF) eating up all the module memory to the
point where we cannot even load kernel modules anymore. Given that memory
resource is global, we added the bpf_jit_limit / bpf_jit_current acounting
as a fix/heuristic via ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict
unpriv allocations"). If we wouldn't account for root, how would such detection
proposal work otherwise to block unprivileged? I don't think it's feasible to
only account the latter given privileged progs might have occupied most of the
budget already.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf_jit_limit close shave
  2021-09-22 21:51             ` Daniel Borkmann
@ 2021-09-23  2:03               ` Alexei Starovoitov
  2021-09-23  9:16               ` Lorenz Bauer
  1 sibling, 0 replies; 11+ messages in thread
From: Alexei Starovoitov @ 2021-09-23  2:03 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Lorenz Bauer, Frank Hofmann, bpf, Alexei Starovoitov,
	Andrii Nakryiko, kernel-team

On Wed, Sep 22, 2021 at 2:51 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 9/22/21 1:07 PM, Lorenz Bauer wrote:
> > On Wed, 22 Sept 2021 at 09:20, Frank Hofmann <fhofmann@cloudflare.com> wrote:
> >>
> >>> That jit limit is not there on older kernels and doesn't apply to root.
> >>> How would you notice such a kernel bug in such conditions?
> >>
> >> I'm talking about bpf_jit_current - it's an "overall gauge" for
> >> allocation, priv and unpriv. I understood Lorenz' note as "change it
> >> so it only tracks unpriv BPF mem usage - since we'll never act on
> >> privileged usage anyway"
> >
> > Yes, that was my suggestion indeed. What Frank is saying: it looks
> > like our leak of JIT memory is due to a privileged process. By
> > exempting privileged processes it would be even harder to notice /
> > debug. That's true, and brings me back to my question: what is
> > different about JIT memory that we can't do a better limit?

There is nothing special about JIT and kernel module memory.

> The knob with the limit was basically added back then as a band-aid to avoid
> unprivileged BPF JIT (cBPF or eBPF) eating up all the module memory to the
> point where we cannot even load kernel modules anymore. Given that memory
> resource is global, we added the bpf_jit_limit / bpf_jit_current acounting
> as a fix/heuristic via ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict
> unpriv allocations"). If we wouldn't account for root, how would such detection
> proposal work otherwise to block unprivileged? I don't think it's feasible to
> only account the latter given privileged progs might have occupied most of the
> budget already.

Right.
At the end it boils down to module_alloc() is not using GFP_ACCOUNT.
It's indeed very similar to rlimit issue we had in the past.
I don't have a preference whether normal kernel mods should be memcg-ed,
but JITed memory certainly can be.
bpf progs memory is already covered.
I think we can do that and then can remove this limit. Just like rlimit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf_jit_limit close shave
  2021-09-22 21:51             ` Daniel Borkmann
  2021-09-23  2:03               ` Alexei Starovoitov
@ 2021-09-23  9:16               ` Lorenz Bauer
  2021-09-23 11:52                 ` Daniel Borkmann
  1 sibling, 1 reply; 11+ messages in thread
From: Lorenz Bauer @ 2021-09-23  9:16 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Frank Hofmann, Alexei Starovoitov, bpf, Alexei Starovoitov,
	Andrii Nakryiko, kernel-team

On Wed, 22 Sept 2021 at 22:51, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 9/22/21 1:07 PM, Lorenz Bauer wrote:
> > On Wed, 22 Sept 2021 at 09:20, Frank Hofmann <fhofmann@cloudflare.com> wrote:
> >>
> >>> That jit limit is not there on older kernels and doesn't apply to root.
> >>> How would you notice such a kernel bug in such conditions?
> >>
> >> I'm talking about bpf_jit_current - it's an "overall gauge" for
> >> allocation, priv and unpriv. I understood Lorenz' note as "change it
> >> so it only tracks unpriv BPF mem usage - since we'll never act on
> >> privileged usage anyway"
> >
> > Yes, that was my suggestion indeed. What Frank is saying: it looks
> > like our leak of JIT memory is due to a privileged process. By
> > exempting privileged processes it would be even harder to notice /
> > debug. That's true, and brings me back to my question: what is
> > different about JIT memory that we can't do a better limit?
>
> The knob with the limit was basically added back then as a band-aid to avoid
> unprivileged BPF JIT (cBPF or eBPF) eating up all the module memory to the
> point where we cannot even load kernel modules anymore. Given that memory
> resource is global, we added the bpf_jit_limit / bpf_jit_current acounting
> as a fix/heuristic via ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict
> unpriv allocations"). If we wouldn't account for root, how would such detection
> proposal work otherwise to block unprivileged? I don't think it's feasible to
> only account the latter given privileged progs might have occupied most of the
> budget already.

Thanks, that was the part I was missing. JITed BPF programs are
treated like modules (why?). There is a limited space reserved for
kernel modules.

How does the knob solve the "can't load a new module" problem if our
suggestion / preference is to steer people towards CAP_BPF anyways
(since unpriv BPF is trouble)? Over time all BPF will be privileged
and we're in the same mess again?

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf_jit_limit close shave
  2021-09-23  9:16               ` Lorenz Bauer
@ 2021-09-23 11:52                 ` Daniel Borkmann
  2021-09-24 10:35                   ` Lorenz Bauer
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Borkmann @ 2021-09-23 11:52 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Frank Hofmann, Alexei Starovoitov, bpf, Alexei Starovoitov,
	Andrii Nakryiko, kernel-team

On 9/23/21 11:16 AM, Lorenz Bauer wrote:
> On Wed, 22 Sept 2021 at 22:51, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 9/22/21 1:07 PM, Lorenz Bauer wrote:
>>> On Wed, 22 Sept 2021 at 09:20, Frank Hofmann <fhofmann@cloudflare.com> wrote:
>>>>
>>>>> That jit limit is not there on older kernels and doesn't apply to root.
>>>>> How would you notice such a kernel bug in such conditions?
>>>>
>>>> I'm talking about bpf_jit_current - it's an "overall gauge" for
>>>> allocation, priv and unpriv. I understood Lorenz' note as "change it
>>>> so it only tracks unpriv BPF mem usage - since we'll never act on
>>>> privileged usage anyway"
>>>
>>> Yes, that was my suggestion indeed. What Frank is saying: it looks
>>> like our leak of JIT memory is due to a privileged process. By
>>> exempting privileged processes it would be even harder to notice /
>>> debug. That's true, and brings me back to my question: what is
>>> different about JIT memory that we can't do a better limit?
>>
>> The knob with the limit was basically added back then as a band-aid to avoid
>> unprivileged BPF JIT (cBPF or eBPF) eating up all the module memory to the
>> point where we cannot even load kernel modules anymore. Given that memory
>> resource is global, we added the bpf_jit_limit / bpf_jit_current acounting
>> as a fix/heuristic via ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict
>> unpriv allocations"). If we wouldn't account for root, how would such detection
>> proposal work otherwise to block unprivileged? I don't think it's feasible to
>> only account the latter given privileged progs might have occupied most of the
>> budget already.
> 
> Thanks, that was the part I was missing. JITed BPF programs are
> treated like modules (why?). There is a limited space reserved for
> kernel modules.

See bpf_jit_alloc_exec() which calls module_alloc() for the images' r+x memory
holding the generated opcodes, and there's only one such pool for the system
on the latter: on x86 in particular, the rationale for module_alloc() use is
so that the image is guaranteed to be within +/- 2GB of where the kernel image
resides. See the encoding of BPF_CALL with __bpf_call_base + imm32, for example.

> How does the knob solve the "can't load a new module" problem if our
> suggestion / preference is to steer people towards CAP_BPF anyways
> (since unpriv BPF is trouble)? Over time all BPF will be privileged
> and we're in the same mess again?

Keep in mind that the knob was added before CAP_BPF. In general, unprivileged
cBPF->eBPF is also using the same bpf_jit_alloc_exec() for the JIT, so that
needs to be taken into consideration as well, but if you grant an application
CAP_BPF then you're essentially privileged. The knob's point was to prevent
fully unprivileged users to play bad games.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bpf_jit_limit close shave
  2021-09-23 11:52                 ` Daniel Borkmann
@ 2021-09-24 10:35                   ` Lorenz Bauer
  0 siblings, 0 replies; 11+ messages in thread
From: Lorenz Bauer @ 2021-09-24 10:35 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Frank Hofmann, Alexei Starovoitov, bpf, Alexei Starovoitov,
	Andrii Nakryiko, kernel-team

On Thu, 23 Sept 2021 at 12:52, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> See bpf_jit_alloc_exec() which calls module_alloc() for the images' r+x memory
> holding the generated opcodes, and there's only one such pool for the system
> on the latter: on x86 in particular, the rationale for module_alloc() use is
> so that the image is guaranteed to be within +/- 2GB of where the kernel image
> resides. See the encoding of BPF_CALL with __bpf_call_base + imm32, for example.

Thanks, makes a lot more sense now. I sent some more clean up patches your way.

> > How does the knob solve the "can't load a new module" problem if our
> > suggestion / preference is to steer people towards CAP_BPF anyways
> > (since unpriv BPF is trouble)? Over time all BPF will be privileged
> > and we're in the same mess again?
>
> Keep in mind that the knob was added before CAP_BPF. In general, unprivileged
> cBPF->eBPF is also using the same bpf_jit_alloc_exec() for the JIT, so that
> needs to be taken into consideration as well, but if you grant an application
> CAP_BPF then you're essentially privileged. The knob's point was to prevent
> fully unprivileged users to play bad games.

You're right, it does help with that. Now, how do I solve the problem
of our privileged (but automated!) tooling eating up all the memory
anyways?

As an aside: it's _really_ hard (impossible?) to track down where this
memory is used. cbpf -> ebpf conversions don't show up in bpftool,
where does one go to look?

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-24 10:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21 11:49 bpf_jit_limit close shave Lorenz Bauer
2021-09-21 14:34 ` Alexei Starovoitov
2021-09-21 15:52   ` Lorenz Bauer
     [not found]     ` <CABEBQi=WfdJ-h+5+fgFXOptDWSk2Oe_V85gR90G2V+PQh9ME0A@mail.gmail.com>
2021-09-21 19:59       ` Alexei Starovoitov
2021-09-22  8:20         ` Frank Hofmann
2021-09-22 11:07           ` Lorenz Bauer
2021-09-22 21:51             ` Daniel Borkmann
2021-09-23  2:03               ` Alexei Starovoitov
2021-09-23  9:16               ` Lorenz Bauer
2021-09-23 11:52                 ` Daniel Borkmann
2021-09-24 10:35                   ` Lorenz Bauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).