All of lore.kernel.org
 help / color / mirror / Atom feed
* [Q] Unable to load SCHED_CLS/SCHED_ACT bpf programs from outside init_user_ns
@ 2018-02-10  7:46 Shmulik Ladkani
  2018-02-10 13:08 ` Daniel Borkmann
  0 siblings, 1 reply; 3+ messages in thread
From: Shmulik Ladkani @ 2018-02-10  7:46 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: Chenbo Feng, eyal, netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko

Hi,

Apparently one cannot use TC cls_bpf/act_bpf if running from a user ns
other than the init_user_ns, as bpf_prog_load does not permit loading
these type of progs, snip:

        if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
            type != BPF_PROG_TYPE_CGROUP_SKB &&
            !capable(CAP_SYS_ADMIN))
                return -EPERM;

although the user performing BPF_PROG_LOAD has both CAP_SYS_ADMIN (and
CAP_NET_ADMIN, as required by RTM_NEWTFILTER) in his current_user_ns.

This prevents using tc cls_bpf/act_bpf in containerized software
stacks (where in contrast other tc cls/act are permitted).

The original restiction comes from
    1be7f75d1668 "bpf: enable non-root eBPF programs"
quote:
    tracing and tc cls/act program types still require root permissions,
    since tracing actually needs to be able to see all kernel pointers
    and tc is for root only.

Can the restriction be relaxed, as done for TYPE_SOCKET_FILTER and later
for TYPE_CGROUP_SKB?

Are the SCHED_CLS/SCHED_ACT progs still suspectable of leaking kernel
pointers?
If so, can we restrict only certain operations which are guaranteed not
to leak, so that tc cls_bpf/act_bpf can still be used outside
init_user_ns?

Thanks,
Shmulik

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Q] Unable to load SCHED_CLS/SCHED_ACT bpf programs from outside init_user_ns
  2018-02-10  7:46 [Q] Unable to load SCHED_CLS/SCHED_ACT bpf programs from outside init_user_ns Shmulik Ladkani
@ 2018-02-10 13:08 ` Daniel Borkmann
  2018-02-10 15:28   ` Shmulik Ladkani
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Borkmann @ 2018-02-10 13:08 UTC (permalink / raw)
  To: Shmulik Ladkani, Alexei Starovoitov
  Cc: Chenbo Feng, eyal, netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko

Hi Shmulik,

On 02/10/2018 08:46 AM, Shmulik Ladkani wrote:
> Hi,
> 
> Apparently one cannot use TC cls_bpf/act_bpf if running from a user ns
> other than the init_user_ns, as bpf_prog_load does not permit loading
> these type of progs, snip:
> 
>         if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
>             type != BPF_PROG_TYPE_CGROUP_SKB &&
>             !capable(CAP_SYS_ADMIN))
>                 return -EPERM;
> 
> although the user performing BPF_PROG_LOAD has both CAP_SYS_ADMIN (and
> CAP_NET_ADMIN, as required by RTM_NEWTFILTER) in his current_user_ns.
> 
> This prevents using tc cls_bpf/act_bpf in containerized software
> stacks (where in contrast other tc cls/act are permitted).

Not really, it's correct that it's initns root-only, but for containers
control plane can attach BPF progs out of initns into the host-facing
veth on ingress/egress clsact side to enforce policy, mangle packets etc.
The other option you would have is that controller would load and pin
the prog as a node into BPF fs and you can then get the fd and attach
it to to the veth inside the netns if this is what you're after (the
attach itself in the second step does not require anything extra compared
to rest of tc) provided the mount is shared at setup time (but could
later be removed in the container for example). In future it might be
subject to change to also enable it for userns under the constraint that
verifier puts more restrictions in place in roughly similar fashion to
current unpriv program types, that work just hasn't been tackled yet.

Thanks,
Daniel

> The original restiction comes from
>     1be7f75d1668 "bpf: enable non-root eBPF programs"
> quote:
>     tracing and tc cls/act program types still require root permissions,
>     since tracing actually needs to be able to see all kernel pointers
>     and tc is for root only.
> 
> Can the restriction be relaxed, as done for TYPE_SOCKET_FILTER and later
> for TYPE_CGROUP_SKB?
> 
> Are the SCHED_CLS/SCHED_ACT progs still suspectable of leaking kernel
> pointers?
> If so, can we restrict only certain operations which are guaranteed not
> to leak, so that tc cls_bpf/act_bpf can still be used outside
> init_user_ns?
> 
> Thanks,
> Shmulik

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Q] Unable to load SCHED_CLS/SCHED_ACT bpf programs from outside init_user_ns
  2018-02-10 13:08 ` Daniel Borkmann
@ 2018-02-10 15:28   ` Shmulik Ladkani
  0 siblings, 0 replies; 3+ messages in thread
From: Shmulik Ladkani @ 2018-02-10 15:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Chenbo Feng, eyal, netdev, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko

Hi,

On Sat, 10 Feb 2018 14:08:58 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> Hi Shmulik,
> 
> On 02/10/2018 08:46 AM, Shmulik Ladkani wrote:
> > Hi,
> > 
> > Apparently one cannot use TC cls_bpf/act_bpf if running from a user ns
> > other than the init_user_ns, as bpf_prog_load does not permit loading
> > these type of progs, snip:
> > 
> >         if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
> >             type != BPF_PROG_TYPE_CGROUP_SKB &&
> >             !capable(CAP_SYS_ADMIN))
> >                 return -EPERM;
> > 
> > although the user performing BPF_PROG_LOAD has both CAP_SYS_ADMIN (and
> > CAP_NET_ADMIN, as required by RTM_NEWTFILTER) in his current_user_ns.
> > 
> > This prevents using tc cls_bpf/act_bpf in containerized software
> > stacks (where in contrast other tc cls/act are permitted).  
> 
> Not really, it's correct that it's initns root-only, but for containers
> control plane can attach BPF progs out of initns into the host-facing
> veth on ingress/egress clsact side to enforce policy, mangle packets etc.
> The other option you would have is that controller would load and pin
> the prog as a node into BPF fs and you can then get the fd and attach
> it to to the veth inside the netns if this is what you're after (the
> attach itself in the second step does not require anything extra compared
> to rest of tc) provided the mount is shared at setup time (but could
> later be removed in the container for example).

Thanks Daniel for the suggestions.

Unfortunately these won't do for our application; Assume for example a
multi-tenant network service, where each container holds the
application stack servicing a tenant. The host in this case is rather
dumb.

Moreover, the software stack in each container may create various
network devices dynamically (e.g. tunnels, dummies) and needs to apply
some cls/act on these dynamic virtual devices (and not on the initial
veth itself).

> In future it might be
> subject to change to also enable it for userns under the constraint that
> verifier puts more restrictions in place in roughly similar fashion to
> current unpriv program types, that work just hasn't been tackled yet.

How far are we from acheiving this?
Can you point to what's missing, perhaps we can assist on the matter?

Thanks,
Shmulik

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-02-10 15:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-10  7:46 [Q] Unable to load SCHED_CLS/SCHED_ACT bpf programs from outside init_user_ns Shmulik Ladkani
2018-02-10 13:08 ` Daniel Borkmann
2018-02-10 15:28   ` Shmulik Ladkani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.