From mboxrd@z Thu Jan  1 00:00:00 1970
From: Shmulik Ladkani <shmulik@metanetworks.com>
Subject: [Q] Unable to load SCHED_CLS/SCHED_ACT bpf programs from outside
 init_user_ns
Date: Sat, 10 Feb 2018 09:46:17 +0200
Message-ID: <20180210094617.3ca6faf8@pixies>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: Chenbo Feng <fengc@google.com>, eyal@metanetworks.com,
        netdev@vger.kernel.org, Jamal Hadi Salim <jhs@mojatatu.com>,
        Cong Wang <xiyou.wangcong@gmail.com>,
        Jiri Pirko <jiri@resnulli.us>
To: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-wm0-f42.google.com ([74.125.82.42]:33962 "EHLO
        mail-wm0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750756AbeBJHqY (ORCPT
        <rfc822;netdev@vger.kernel.org>); Sat, 10 Feb 2018 02:46:24 -0500
Received: by mail-wm0-f42.google.com with SMTP id j21so1572919wmh.1
        for <netdev@vger.kernel.org>; Fri, 09 Feb 2018 23:46:24 -0800 (PST)
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

Hi,

Apparently one cannot use TC cls_bpf/act_bpf if running from a user ns
other than the init_user_ns, as bpf_prog_load does not permit loading
these type of progs, snip:

        if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
            type != BPF_PROG_TYPE_CGROUP_SKB &&
            !capable(CAP_SYS_ADMIN))
                return -EPERM;

although the user performing BPF_PROG_LOAD has both CAP_SYS_ADMIN (and
CAP_NET_ADMIN, as required by RTM_NEWTFILTER) in his current_user_ns.

This prevents using tc cls_bpf/act_bpf in containerized software
stacks (where in contrast other tc cls/act are permitted).

The original restiction comes from
    1be7f75d1668 "bpf: enable non-root eBPF programs"
quote:
    tracing and tc cls/act program types still require root permissions,
    since tracing actually needs to be able to see all kernel pointers
    and tc is for root only.

Can the restriction be relaxed, as done for TYPE_SOCKET_FILTER and later
for TYPE_CGROUP_SKB?

Are the SCHED_CLS/SCHED_ACT progs still suspectable of leaking kernel
pointers?
If so, can we restrict only certain operations which are guaranteed not
to leak, so that tc cls_bpf/act_bpf can still be used outside
init_user_ns?

Thanks,
Shmulik