From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A400CC76186 for ; Wed, 24 Jul 2019 01:40:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 61EE62238C for ; Wed, 24 Jul 2019 01:40:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=amacapital-net.20150623.gappssmtp.com header.i=@amacapital-net.20150623.gappssmtp.com header.b="hferate5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728423AbfGXBkd (ORCPT ); Tue, 23 Jul 2019 21:40:33 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:46745 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726339AbfGXBkc (ORCPT ); Tue, 23 Jul 2019 21:40:32 -0400 Received: by mail-pl1-f196.google.com with SMTP id c2so21284382plz.13 for ; Tue, 23 Jul 2019 18:40:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=fugMyQialasQg77i1MwbSB/hy058oIuvmQv9Y6tRr8I=; b=hferate5+QJcLPj2tAGig4LvzjyxwA7qbAq+ZZYnyGQsHCf0vtfGC2nYAcoCq/ubtt g9J065NnEX5JYfhjXkzw+JksQc3nSV+fUISqXdOjqNIM3o471s6CIOD9nN4ZqmuCv+t2 cjKhj4jA4OBb6VxlsCkziQ99TueAg0cXav6rh6Eqemal0op1Ua63R+nYU0R66gL+T7Z4 3te++RkMDL0hFZf8QI6GQvddpT14rjbTeLdK/zad5CMPRrVpqaMGl08KZnhsi7ap41u9 w28qSz8DnrLLiG25UPaQtkN5trn9t4tnn8TyXb5Crr+/S6JgBin4OpZnVE7+UFZZZOnX rWOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=fugMyQialasQg77i1MwbSB/hy058oIuvmQv9Y6tRr8I=; b=R6wQZxG/c4PBs9GNLlYK6JklF05QNsX3qB7KKPpRdS04nBfYVItL/J6KkDev/dHYqN UcQUmiO83hSgubvpjB7brB0gzBq1nMzUe3s115HquZLCpSO+MOCvnzXwd3DlGZTxO4b8 z/gIJxro0RBzxKv7x3169pQl67KYh4QGv1P2UNfUxEDnIrcHRpnN1kDPHB5D8IM/Eipk Hj1Nzfk1Qcq9jgIrqmVBXL6JT3MA5MPjaZXlVGSpBcIwGBkaMWXG3Rcu5RPXbSDFcUlN I12KK38gt7mT/eZAX6cSQxsF9m1/uOprV/DLoVqV3hAgsJVJjD9+9KsadkS/qY/mXNdN NVrQ== X-Gm-Message-State: APjAAAXe1XrHFGMP6nb9NDXlItcKHoL2tvwoUuV7IqiB1pVYwE5q7RK7 RfhdM3Z1kIPDnPxOu+nLA3HcDaBtcTE= X-Google-Smtp-Source: APXvYqxDgG1VQC8Z4Kl+Efc0w8MxVrAvet4CeE5c0galulRK0so/jOD5qMSdZrhm0Y2qhb3f7NJ4QA== X-Received: by 2002:a17:902:1486:: with SMTP id k6mr81977059pla.177.1563932431622; Tue, 23 Jul 2019 18:40:31 -0700 (PDT) Received: from ?IPv6:2600:1010:b055:19e0:81e6:db78:9a51:8f05? ([2600:1010:b055:19e0:81e6:db78:9a51:8f05]) by smtp.gmail.com with ESMTPSA id t8sm48118442pfq.31.2019.07.23.18.40.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Jul 2019 18:40:30 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf From: Andy Lutomirski X-Mailer: iPhone Mail (16F203) In-Reply-To: <514D5453-0AEE-420F-AEB6-3F4F58C62E7E@fb.com> Date: Tue, 23 Jul 2019 18:40:28 -0700 Cc: Andy Lutomirski , Kees Cook , "linux-security@vger.kernel.org" , Networking , bpf , Alexei Starovoitov , Daniel Borkmann , Kernel Team , Lorenz Bauer , Jann Horn , Greg KH , Linux API Content-Transfer-Encoding: quoted-printable Message-Id: <1DE886F3-3982-45DE-B545-67AD6A4871AB@amacapital.net> References: <20190627201923.2589391-1-songliubraving@fb.com> <20190627201923.2589391-2-songliubraving@fb.com> <21894f45-70d8-dfca-8c02-044f776c5e05@kernel.org> <3C595328-3ABE-4421-9772-8D41094A4F57@fb.com> <0DE7F23E-9CD2-4F03-82B5-835506B59056@fb.com> <201907021115.DCD56BBABB@keescook> <4A7A225A-6C23-4C0F-9A95-7C6C56B281ED@fb.com> <514D5453-0AEE-420F-AEB6-3F4F58C62E7E@fb.com> To: Song Liu Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org > On Jul 23, 2019, at 3:56 PM, Song Liu wrote: >=20 >=20 >=20 >> On Jul 23, 2019, at 8:11 AM, Andy Lutomirski wrote: >>=20 >> On Mon, Jul 22, 2019 at 1:54 PM Song Liu wrote: >>>=20 >>> Hi Andy, Lorenz, and all, >>>=20 >>>> On Jul 2, 2019, at 2:32 PM, Andy Lutomirski wrote: >>>>=20 >>>> On Tue, Jul 2, 2019 at 2:04 PM Kees Cook wrote:= >>>>>=20 >>>>>> On Mon, Jul 01, 2019 at 06:59:13PM -0700, Andy Lutomirski wrote: >>>>>> I think I'm understanding your motivation. You're not trying to make= >>>>>> bpf() generically usable without privilege -- you're trying to create= >>>>>> a way to allow certain users to access dangerous bpf functionality >>>>>> within some limits. >>>>>>=20 >>>>>> That's a perfectly fine goal, but I think you're reinventing the >>>>>> wheel, and the wheel you're reinventing is quite complicated and >>>>>> already exists. I think you should teach bpftool to be secure when >>>>>> installed setuid root or with fscaps enabled and put your policy in >>>>>> bpftool. If you want to harden this a little bit, it would seem >>>>>> entirely reasonable to add a new CAP_BPF_ADMIN and change some, but >>>>>> not all, of the capable() checks to check CAP_BPF_ADMIN instead of th= e >>>>>> capabilities that they currently check. >>>>>=20 >>>>> If finer grained controls are wanted, it does seem like the /dev/bpf >>>>> path makes the most sense. open, request abilities, use fd. The open c= an >>>>> be mediated by DAC and LSM. The request can be mediated by LSM. This >>>>> provides a way to add policy at the LSM level and at the tool level. >>>>> (i.e. For tool-level controls: leave LSM wide open, make /dev/bpf owne= d >>>>> by "bpfadmin" and bpftool becomes setuid "bpfadmin". For fine-grained >>>>> controls, leave /dev/bpf wide open and add policy to SELinux, etc.) >>>>>=20 >>>>> With only a new CAP, you don't get the fine-grained controls. (The >>>>> "request abilities" part is the key there.) >>>>=20 >>>> Sure you do: the effective set. It has somewhat bizarre defaults, but >>>> I don't think that's a real problem. Also, this wouldn't be like >>>> CAP_DAC_READ_SEARCH -- you can't accidentally use your BPF caps. >>>>=20 >>>> I think that a /dev capability-like object isn't totally nuts, but I >>>> think we should do it well, and this patch doesn't really achieve >>>> that. But I don't think bpf wants fine-grained controls like this at >>>> all -- as I pointed upthread, a fine-grained solution really wants >>>> different treatment for the different capable() checks, and a bunch of >>>> them won't resemble capabilities or /dev/bpf at all. >>>=20 >>> With 5.3-rc1 out, I am back on this. :) >>>=20 >>> How about we modify the set as: >>> 1. Introduce sys_bpf_with_cap() that takes fd of /dev/bpf. >>=20 >> I'm fine with this in principle, but: >>=20 >>> 2. Better handling of capable() calls through bpf code. I guess the >>> biggest problem here is is_priv in verifier.c:bpf_check(). >>=20 >> I think it would be good to understand exactly what /dev/bpf will >> enable one to do. Without some care, it would just become the next >> CAP_SYS_ADMIN: if you can open it, sure, you're not root, but you can >> intercept network traffic, modify cgroup behavior, and do plenty of >> other things, any of which can probably be used to completely take >> over the system. >=20 > Well, yes. sys_bpf() is pretty powerful.=20 >=20 > The goal of /dev/bpf is to enable special users to call sys_bpf(). In=20 > the meanwhile, such users should not take down the whole system easily > by accident, e.g., with rm -rf /. That=E2=80=99s easy, though =E2=80=94 bpftool could learn to read /etc/bpfus= ers before allowing ruid !=3D 0. >=20 > It is similar to CAP_BPF_ADMIN, without really adding the CAP_. =20 >=20 > I think adding new CAP_ requires much more effort.=20 >=20 A new CAP_ is straightforward =E2=80=94 add the definition and change the ma= x cap. >>=20 >> It would also be nice to understand why you can't do what you need to >> do entirely in user code using setuid or fscaps. >=20 > It is not very easy to achieve the same control: only certain users can > run certain tools (bpftool, etc.).=20 >=20 > The closest approach I can find is: > 1. use libcap (pam_cap) to give CAP_SETUID to certain users; > 2. add setuid(0) to bpftool. >=20 > The difference between this approach and /dev/bpf is that certain users > would be able to run other tools that call setuid(). Though I am not=20 > sure how many tools call setuid(), and how risky they are.=20 I think you=E2=80=99re misunderstanding me. Install bpftool with either the s= etuid (S_ISUID) mode or with an appropriate fscap bit =E2=80=94 see the setc= ap(8) manpage. The downside of this approach is that it won=E2=80=99t work well in a contai= ner, and containers are cool these days :) >=20 >>=20 >> Finally, at risk of rehashing some old arguments, I'll point out that >> the bpf() syscall is an unusual design to begin with. As an example, >> consider bpf_prog_attach(). Outside of bpf(), if I want to change the >> behavior of a cgroup, I would write to a file in >> /sys/kernel/cgroup/unified/whatever/, and normal DAC and MAC rules >> apply. With bpf(), however, I just call bpf() to attach a program to >> the cgroup. bpf() says "oh, you are capable(CAP_NET_ADMIN) -- go for >> it!". Unless I missed something major, and I just re-read the code, >> there is no check that the caller has write or LSM permission to >> anything at all in cgroupfs, and the existing API would make it very >> awkward to impose any kind of DAC rules here. >>=20 >> So I think it might actually be time to repay some techincal debt and >> come up with a real fix. As a less intrusive approach, you could see >> about requiring ownership of the cgroup directory instead of >> CAP_NET_ADMIN. As a more intrusive but perhaps better approach, you >> could invert the logic to to make it work like everything outside of >> cgroup: add pseudo-files like bpf.inet_ingress to the cgroup >> directories, and require a writable fd to *that* to a new improved >> attach API. If a user could do: >>=20 >> int fd =3D open("/sys/fs/cgroup/.../bpf.inet_attach", O_RDWR); /* usual >> DAC and MAC policy applies */ >> int bpf_fd =3D setup the bpf stuff; /* no privilege required, unless >> the program is huge or needs is_priv */ >> bpf(BPF_IMPROVED_ATTACH, target =3D fd, program =3D bpf_fd); >>=20 >> there would be no capabilities or global privilege at all required for >> this. It would just work with cgroup delegation, containers, etc. >>=20 >> I think you could even pull off this type of API change with only >> libbpf changes. In particular, there's this code: >>=20 >> int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type= , >> unsigned int flags) >> { >> union bpf_attr attr; >>=20 >> memset(&attr, 0, sizeof(attr)); >> attr.target_fd =3D target_fd; >> attr.attach_bpf_fd =3D prog_fd; >> attr.attach_type =3D type; >> attr.attach_flags =3D flags; >>=20 >> return sys_bpf(BPF_PROG_ATTACH, &attr, sizeof(attr)); >> } >>=20 >> This would instead do something like: >>=20 >> int specific_target_fd =3D openat(target_fd, bpf_type_to_target[type], O_= RDWR); >> attr.target_fd =3D specific_target_fd; >> ... >>=20 >> return sys_bpf(BPF_PROG_IMPROVED_ATTACH, &attr, sizeof(attr)); >>=20 >> Would this solve your problem without needing /dev/bpf at all? >=20 > This gives fine grain access control. I think it solves the problem.=20 > But it also requires a lot of rework to sys_bpf(). And it may also=20 > break backward/forward compatibility? >=20 I think the compatibility issue is manageable. The current bpf() interface w= ould be supported for at least several years, and libbpf could detect that t= he new interface isn=E2=80=99t supported and fall back the old interface > Personally, I think it is an overkill for the original motivation:=20 > call sys_bpf() with special user instead of root.=20 It=E2=80=99s overkill for your specific use case, but I=E2=80=99m trying to e= ncourage you to either solve your problem entirely in userspace or to solve a= more general problem in the kernel :) In furtherance of bpf=E2=80=99s goal of world domination, I think it would b= e great if it Just Worked in a container. My proposal does this.