From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D271DC3A5A0 for ; Mon, 19 Aug 2019 17:39:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9A22D22CE9 for ; Mon, 19 Aug 2019 17:39:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=amacapital-net.20150623.gappssmtp.com header.i=@amacapital-net.20150623.gappssmtp.com header.b="gg7ZbGjU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727936AbfHSRjA (ORCPT ); Mon, 19 Aug 2019 13:39:00 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:32944 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726959AbfHSRi7 (ORCPT ); Mon, 19 Aug 2019 13:38:59 -0400 Received: by mail-pg1-f196.google.com with SMTP id n190so1625413pgn.0 for ; Mon, 19 Aug 2019 10:38:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XlmO/eodwC71D1bsvJiGJ+8sk3xEUmzJXMjarfbHpdM=; b=gg7ZbGjUDcVLbly6VkVmQCEm7LnkM7/6eeJr1yDs/VllkZKChPUwEdIIM+ASJep+0d eTbcdwDBDnKHjZCSyp5RDqf1TtfMsMh+vt9oAp3e0EwlmkpTYhuHPUHzlHx12lsYMA6B XIIRQq56YHYVfoGZPvdmk762cOlZT9gcyK/DGI1NIT/6fL0go62V36EENoYqRw73qpkm 5C2PnhPOmeBcDCPJX69CgCkLcL5fJS0lVZHv0xerYPYlKw0mZqWbnQRyDII2jbC4t9B8 ip/abnvN/q+dm8L/8HG5bh9s0yT4UbiWPwqT529QMgGUVn4BaLTKI/S3pAiCSPFXQe6F vLyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XlmO/eodwC71D1bsvJiGJ+8sk3xEUmzJXMjarfbHpdM=; b=cJy4a8MASMnRb9qbE9JN5UI01J0XcOHgNLGueOseqAmZHid0w+HtMUpOvLV0hQcZPv GSARqcXbgTScTliidTnLD0FEfvjbwJatD86+Gyr+ht9IPG0CyC7obXJt9bUN4MYyBpK0 +yYW7ybSCEiypUQBExnnGhNmMHTNGmN7bIuFhfNQzXQvgKCQcuZcEw1oroMORNcWeUAz Sud/CxLH76OAHMLD2JRY7esyNrr5ROGF9Hjj3mptzTfcmb8sRPYTgs9GxXN+FR+v3n8t qdK8pQSXoiV8uPMFpL16L3JT+uPNX4RhhsGunbjTBwFDsVbcMyQUs2qaBRqoNs1aAD6O BJlQ== X-Gm-Message-State: APjAAAXSik4ScfzGgTD/8mNlt/q6Hs2A9Goj5uRiuTidayNTJjp85y6a DhCkVoGaKJ/ioOA6FLuXMBIBng== X-Google-Smtp-Source: APXvYqxPdCTE7oJv0KRum5I0reGZkgvYoxAuvehn29xWjMgxP5kckXF7eMEtwBbvDmFwEEKLk08xug== X-Received: by 2002:aa7:9713:: with SMTP id a19mr25281790pfg.64.1566236338926; Mon, 19 Aug 2019 10:38:58 -0700 (PDT) Received: from ?IPv6:2601:646:c200:1ef2:f907:f7f1:6354:309? ([2601:646:c200:1ef2:f907:f7f1:6354:309]) by smtp.gmail.com with ESMTPSA id br18sm14684454pjb.20.2019.08.19.10.38.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 19 Aug 2019 10:38:57 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf From: Andy Lutomirski X-Mailer: iPhone Mail (16G77) In-Reply-To: <20190819172718.jwnvvotssxwhc7m6@ast-mbp.dhcp.thefacebook.com> Date: Mon, 19 Aug 2019 10:38:56 -0700 Cc: Thomas Gleixner , Jordan Glover , Andy Lutomirski , Daniel Colascione , Song Liu , Kees Cook , Networking , bpf , Alexei Starovoitov , Daniel Borkmann , Kernel Team , Lorenz Bauer , Jann Horn , Greg KH , Linux API , LSM List Content-Transfer-Encoding: quoted-printable Message-Id: References: <20190815172856.yoqvgu2yfrgbkowu@ast-mbp.dhcp.thefacebook.com> <20190815230808.2o2qe7a72cwdce2m@ast-mbp.dhcp.thefacebook.com> <20190816195233.vzqqbqrivnooohq6@ast-mbp.dhcp.thefacebook.com> <20190817150245.xxzxqjpvgqsxmloe@ast-mbp> <20190819172718.jwnvvotssxwhc7m6@ast-mbp.dhcp.thefacebook.com> To: Alexei Starovoitov Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: > On Aug 19, 2019, at 10:27 AM, Alexei Starovoitov wrote: >=20 >> On Mon, Aug 19, 2019 at 11:15:11AM +0200, Thomas Gleixner wrote: >> Alexei, >>=20 >>> On Sat, 17 Aug 2019, Alexei Starovoitov wrote: >>>> On Fri, Aug 16, 2019 at 10:28:29PM +0200, Thomas Gleixner wrote: >>>> On Fri, 16 Aug 2019, Alexei Starovoitov wrote: >>>> While real usecases are helpful to understand a design decision, the de= sign >>>> needs to be usecase independent. >>>>=20 >>>> The kernel provides mechanisms, not policies. My impression of this who= le >>>> discussion is that it is policy driven. That's the wrong approach. >>>=20 >>> not sure what you mean by 'policy driven'. >>> Proposed CAP_BPF is a policy? >>=20 >> I was referring to the discussion as a whole. >>=20 >>> Can kernel.unprivileged_bpf_disabled=3D1 be used now? >>> Yes, but it will weaken overall system security because things that >>> use unpriv to load bpf and CAP_NET_ADMIN to attach bpf would need >>> to move to stronger CAP_SYS_ADMIN. >>>=20 >>> With CAP_BPF both load and attach would happen under CAP_BPF >>> instead of CAP_SYS_ADMIN. >>=20 >> I'm not arguing against that. >>=20 >>>> So let's look at the mechanisms which we have at hand: >>>>=20 >>>> 1) Capabilities >>>>=20 >>>> 2) SUID and dropping priviledges >>>>=20 >>>> 3) Seccomp and LSM >>>>=20 >>>> Now the real interesting questions are: >>>>=20 >>>> A) What kind of restrictions does BPF allow? Is it a binary on/off or i= s >>>> there a more finegrained control of BPF functionality? >>>>=20 >>>> TBH, I can't tell. >>>>=20 >>>> B) Depending on the answer to #A what is the control possibility for >>>> #1/#2/#3 ? >>>=20 >>> Can any of the mechanisms 1/2/3 address the concern in mds.rst? >>=20 >> Well, that depends. As with any other security policy which is implemente= d >> via these mechanisms, the policy can be strict enough to prevent it by no= t >> allowing certain operations. The more fine-grained the control is, it >> allows the administrator who implements the policy to remove the >> 'dangerous' parts from an untrusted user. >>=20 >> So really question #A is important for this. Is BPF just providing a bina= ry >> ON/OFF knob or does it allow to disable/enable certain aspects of BPF >> functionality in a more fine grained way? If the latter, then it might be= >> possible to control functionality which might be abused for exploits of >> some sorts (including MDS) in a way which allows other parts of BBF to be= >> exposed to less priviledged contexts. >=20 > I see. So the kernel.unprivileged_bpf_disabled knob is binary and I think i= t's > the right mechanism to expose to users. > Having N knobs for every map/prog type won't decrease attack surface. > In the other email Andy's quoting seccomp man page... > Today seccomp cannot really look into bpf_attr syscall args, but even > if it could it won't secure the system. > Examples: > 1. > spectre v2 is using bpf in-kernel interpreter in speculative way. > The mere presence of interpreter as part of kernel .text makes the exploit= > easier to do. That was the reason to do CONFIG_BPF_JIT_ALWAYS_ON. > For this case even kernel.unprivileged_bpf_disabled=3D1 was hopeless. >=20 > 2. > var4 doing store hazard. It doesn't matter which program type is used. > load/store instructions are the same across program types. >=20 > 3. > prog_array was used as part of var1. I guess it was simply more > convenient for Jann to do it this way :) All other map types > have the same out-of-bounds speculation issue. >=20 > In general side channels are cpu bugs that are exploited via sequences > of cpu instructions. In that sense bpf infra provides these instructions. > So all program types and all maps have the same level of 'side channel ris= k'. >=20 >>> I believe Andy wants to expand the attack surface when >>> kernel.unprivileged_bpf_disabled=3D0 >>> Before that happens I'd like the community to work on addressing the tex= t above. >>=20 >> Well, that text above can be removed when the BPF wizards are entirely su= re >> that BPF cannot be abused to exploit stuff.=20 >=20 > Myself and Daniel looked at it in detail. I think we understood > MDS mechanism well enough. Right now we're fairly confident that > combination of existing mechanisms we did for var4 and > verifier speculative analysis protect us from MDS. > The thing is that every new cpu bug is looked at through the bpf lenses. > Can it be exploited through bpf? Complexity of side channels > is growing. Can the most recent swapgs be exploited ? > What if we kprobe+bpf somewhere ? > I don't think there is an issue, but we will never be 'entirely sure'. > Even if myself and Daniel are sure the concern will stay. > Unprivileged bpf as a whole is the concern due to side channels. > The number of them are not yet disclosed. Who is going to analyze them? > imo the only answer to that is kernel.unprivileged_bpf_disabled=3D1 > which together with CONFIG_BPF_JIT_ALWAYS_ON is secure enough. > The other option is to sprinkle every bpf load/store with lfence > which will make execution so slow that it will be unusable. > Which is effectively the same as unprivileged_bpf_disabled=3D1. >=20 > There are other things we can do. Like kasan-style shadow memory > for bpf execution. Auto re-JITing the code after it's running. > We can do lfences everywhere for some time then re-JIT > when kasan-ed shadow memory shows only clean memory accesses. > The beauty of BPF that it is analyze-able and JIT-able instruction set. > The verifier speculative analysis is an example that the kernel > can analyze the speculative execution path that cpu will > take before the code starts executing. > Unprivileged bpf can made absolutely secure. It can be > made more secure than the rest of the kernel. > But today we should just go with unprivileged_bpf_disabled=3D1 I=E2=80=99m still okay with this. > and CAP_BPF. >=20 I think this needs more design work. I=E2=80=99m halfway through writing up= an actual proposal. I=E2=80=99ll send it soon.=