From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AD2EC3A5A1 for ; Wed, 28 Aug 2019 06:12:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D467E2173E for ; Wed, 28 Aug 2019 06:12:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1566972766; bh=ODzQkK6y27HTo68G01A4Bfylvnf3Ipj6FoWEYUVyD1k=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=nI4hPxgvTCxUuw8DwFSD2qD/WUlcImgzLGzOXKZvRC3buwAKktK5tkGnncF+gUXGK TsdM8S66rhV6Kf5rJWKbz0QdoxvQKtdGlZJf13wTVkwOyyc3uFudhqUjuEjF+o4/IO /PiIJDn7O8t+bNp3cIkbLWaPT8PiJ8UIx25BlqtU= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726428AbfH1GMp (ORCPT ); Wed, 28 Aug 2019 02:12:45 -0400 Received: from mail.kernel.org ([198.145.29.99]:51530 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726199AbfH1GMo (ORCPT ); Wed, 28 Aug 2019 02:12:44 -0400 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1ED2E2341C for ; Wed, 28 Aug 2019 06:12:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1566972763; bh=ODzQkK6y27HTo68G01A4Bfylvnf3Ipj6FoWEYUVyD1k=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=yv6jQRLFdWGyyb5DKLDWI4t2UfhKj784gNv9AAHwaPaPDqXlVUckEmII5IcnMwuwL oeoN4MQtrfedcHvyrtZ+3C4G6tIzoUqgHXmGYh2e5XoffCSUf5ypfOAoAPMlUKjbbm ZIJ0FMdKEC7PWV/QS4W+Ii+fT2s7wv9kq2q1OGN8= Received: by mail-wr1-f47.google.com with SMTP id b16so1145639wrq.9 for ; Tue, 27 Aug 2019 23:12:43 -0700 (PDT) X-Gm-Message-State: APjAAAXgUXA2GJQN4pzOlFxS7MAq60GYKYHghU40IgvHc4w0D8hzzb+7 jOGnJv80ep3H3jY2TVsIZj9/fQi5jGbgBTsLc1QEmA== X-Google-Smtp-Source: APXvYqwm8CdxDaqnZvTy7LYUZZA5fEy5yH/c41ZCti/xj4ypzoZ14FdW4nC0tYfmE9AtKy6W6kkpjlDxGD49ApJoba4= X-Received: by 2002:adf:f18c:: with SMTP id h12mr2179606wro.47.1566972761352; Tue, 27 Aug 2019 23:12:41 -0700 (PDT) MIME-Version: 1.0 References: <20190827205213.456318-1-ast@kernel.org> <20190828003447.htgzsxs5oevn3eys@ast-mbp.dhcp.thefacebook.com> <20190828044340.zeha3k3cmmxgfqj7@ast-mbp.dhcp.thefacebook.com> In-Reply-To: <20190828044340.zeha3k3cmmxgfqj7@ast-mbp.dhcp.thefacebook.com> From: Andy Lutomirski Date: Tue, 27 Aug 2019 23:12:29 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF To: Alexei Starovoitov Cc: Andy Lutomirski , Alexei Starovoitov , Kees Cook , LSM List , James Morris , Jann Horn , Peter Zijlstra , Masami Hiramatsu , Steven Rostedt , "David S. Miller" , Daniel Borkmann , Network Development , bpf , kernel-team , Linux API Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, Aug 27, 2019 at 9:43 PM Alexei Starovoitov wrote: > > On Tue, Aug 27, 2019 at 05:55:41PM -0700, Andy Lutomirski wrote: > > > > I was hoping for something in Documentation/admin-guide, not in a > > changelog that's hard to find. > > eventually yes. > > > > > > > > Changing the capability that some existing operation requires could > > > > break existing programs. The old capability may need to be accepted > > > > as well. > > > > > > As far as I can see there is no ABI breakage. Please point out > > > which line of the patch may break it. > > > > As a more or less arbitrary selection: > > > > void bpf_prog_kallsyms_add(struct bpf_prog *fp) > > { > > if (!bpf_prog_kallsyms_candidate(fp) || > > - !capable(CAP_SYS_ADMIN)) > > + !capable(CAP_BPF)) > > return; > > > > Before your patch, a task with CAP_SYS_ADMIN could do this. Now it > > can't. Per the usual Linux definition of "ABI break", this is an ABI > > break if and only if someone actually did this in a context where they > > have CAP_SYS_ADMIN but not all capabilities. How confident are you > > that no one does things like this? > > void bpf_prog_kallsyms_add(struct bpf_prog *fp) > > { > > if (!bpf_prog_kallsyms_candidate(fp) || > > - !capable(CAP_SYS_ADMIN)) > > + !capable(CAP_BPF)) > > return; > > Yes. I'm confident that apps don't drop everything and > leave cap_sys_admin only before doing bpf() syscall, since it would > break their own use of networking. > Hence I'm not going to do the cap_syslog-like "deprecated" message mess > because of this unfounded concern. > If I turn out to be wrong we will add this "deprecated mess" later. > > > > > From the previous discussion, you want to make progress toward solving > > a lot of problems with CAP_BPF. One of them was making BPF > > firewalling more generally useful. By making CAP_BPF grant the ability > > to read kernel memory, you will make administrators much more nervous > > to grant CAP_BPF. > > Andy, were your email hacked? > I explained several times that in this proposal > CAP_BPF _and_ CAP_TRACING _both_ are necessary to read kernel memory. > CAP_BPF alone is _not enough_. You have indeed said this many times. You've stated it as a matter of fact as though it cannot possibly discussed. I'm asking you to justify it. > > Similarly, and correct me if I'm wrong, most of > > these capabilities are primarily or only useful for tracing, so I > > don't see why users without CAP_TRACING should get them. > > bpf_trace_printk(), in particular, even has "trace" in its name :) > > > > Also, if a task has CAP_TRACING, it's expected to be able to trace the > > system -- that's the whole point. Why shouldn't it be able to use BPF > > to trace the system better? > > CAP_TRACING shouldn't be able to do BPF because BPF is not tracing only. What does "do BPF" even mean? seccomp() does BPF. SO_ATTACH_FILTER does BPF. Saying that using BPF should require a specific capability seems kind of like saying that using the network should require a specific capability. Linux (and Unixy systems in general) distinguish between binding low-number ports, binding high-number ports, using raw sockets, and changing the system's IP address. These have different implications and require different capabilities. It seems like you are specifically trying to add a new switch to turn as much of BPF as possible on and off. Why? > > > > test_run allows fully controlled inputs, in a context where a program > > can trivially flush caches, mistrain branch predictors, etc first. It > > seems to me that, if a JITted bpf program contains an exploitable > > speculation gadget (MDS, Spectre v1, RSB, or anything else), > > speaking of MDS... I already asked you to help investigate its > applicability with existing bpf exposure. Are you going to do that? I am blissfully uninvolved in MDS, and I don't know all that much more about the overall mechanism than a random reader of tech news :) ISTM there are two meaningful ways that BPF could be involved: a BPF program could leak info into the state exposed by MDS, or a BPF program could try to read that state. From what little I understand, it's essentially inevitable that BPF leaks information into MDS state, and this is probably even controllable by an attacker that understands MDS in enough detail. So the interesting questions are: can BPF be used to read MDS state and can BPF be used to leak information in a more useful way than the rest of the kernel to an attacker. Keeping in mind that the kernel will flush MDS state on every exit to usermode, I think the most likely attack is to try to read MDS state with BPF. This could happen, I suppose -- BPF programs can easily contain the usual speculation gadgets of "do something and read an address that depends on the outcome". Fortunately, outside of bpf_probe_read(), AFAIK BPF programs can't directly touch user memory, and an attacker that is allowed to use bpf_probe_read() doesn't need MDS to read things. So it's not entirely obvious to me how an attack would be mounted. test_run would make it a lot easier, I think. > > > it will > > be *much* easier to exploit it using test_run than using normal > > network traffic. Similarly, normal network traffic will have network > > headers that are valid enough to have caused the BPF program to be > > invoked in the first place. test_run can inject arbitrary garbage. > > Please take a look at Jann's var1 exploit. Was it hard to run bpf prog > in controlled environment without test_run command ? > Can you send me a link?