Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF

From: Andy Lutomirski <luto@amacapital.net>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	LSM List <linux-security-module@vger.kernel.org>,
	James Morris <jmorris@namei.org>, Jann Horn <jannh@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	"David S. Miller" <davem@davemloft.net>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, kernel-team <kernel-team@fb.com>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
Date: Wed, 28 Aug 2019 17:53:52 -0700	[thread overview]
Message-ID: <260ADB71-A11C-4BFA-997E-9ECB2AE5D0D7@amacapital.net> (raw)
In-Reply-To: <DA52992F-4862-4945-8482-FE619A04C753@amacapital.net>

> On Aug 28, 2019, at 5:45 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> 
> 
>>> On Aug 28, 2019, at 3:55 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>> 
>>> On Tue, Aug 27, 2019 at 11:12:29PM -0700, Andy Lutomirski wrote:
>>>>> 
>>>>> 
>>>>> From the previous discussion, you want to make progress toward solving
>>>>> a lot of problems with CAP_BPF.  One of them was making BPF
>>>>> firewalling more generally useful. By making CAP_BPF grant the ability
>>>>> to read kernel memory, you will make administrators much more nervous
>>>>> to grant CAP_BPF.
>>>> 
>>>> Andy, were your email hacked?
>>>> I explained several times that in this proposal
>>>> CAP_BPF _and_ CAP_TRACING _both_ are necessary to read kernel memory.
>>>> CAP_BPF alone is _not enough_.
>>> 
>>> You have indeed said this many times.  You've stated it as a matter of
>>> fact as though it cannot possibly discussed.  I'm asking you to
>>> justify it.
>> 
>> That's not how I see it.
>> I kept stating that both CAP_BPF and CAP_TRACING are necessary to read
>> kernel memory whereas you kept distorting my statement by dropping second
>> part and then making claims that "CAP_BPF grant the ability to read
>> kernel memory, you will make administrators much more nervous".
> 
> Mea culpa. CAP_BPF does, however, appear to permit breaking kASLR due to unsafe pointer conversions, and it allows reading and writing everyone’s maps.  I stand by my overall point.
> 
>> 
>> Just s/CAP_BPF/CAP_BPF and CAP_TRACING/ in this above sentence.
>> See that meaning suddenly changes?
>> Now administrators would be worried about tasks that have both at once.
>> They also would be worried about tasks that have CAP_TRACING alone,
>> because that's what allows probe_kernel_read().
> 
> This is not all what I meant. Of course granting CAP_BPF+CAP_TRACING allows reading kernel memory. This is not at all a problem.  Here is a problem I see:
> 
> CAP_TRACING + CAP_BPF allows modification of other people’s maps and potentially other things that should not be implied by CAP_TRACING alone and that don’t need to be available to tracers. So CAP_TRACING, which is powerful but has somewhat limited scope, isn’t fully useful without CAP_BPF, and giving CAP_TRACING *and* CAP_BPF allows things that teachers shouldn’t be able to do. I think this would make the whole mechanism less useful to Android, for example.
> 
> (Also, I’m not sure quite what you mean by “CAP_TRACING ... allows probe_kernel_read()”. probe_kernel_read() is a kernel function that can’t be directly called by userspace. CAP_TRACING allows reading kernel memory in plenty of ways regardless.)
> 
>> 
>>> It seems like you are specifically trying to add a new switch to turn
>>> as much of BPF as possible on and off.  Why?
>> 
>> Didn't I explain it several times already with multiple examples
>> from systemd, daemons, bpftrace ?
>> 
>> Let's try again.
>> Take your laptop with linux distro.
>> You're the only user there. I'm assuming you're not sharing it with
>> partner and kids. This is my definition of 'single user system'.
>> You can sudo on it at any time, but obviously prefer to run as many
>> apps as possible without cap_sys_admin.
>> Now you found some awesome open source app on the web that monitors
>> the health of the kernel and will pop a nice message on a screen if
>> something is wrong. Currently this app needs root. You hesitate,
>> but the apps is so useful and it has strong upstream code review process
>> that you keep running it 24/7.
>> This is open source app. New versions come. You upgrade.
>> You have enough trust in that app that you keep running it as root.
>> But there is always a chance that new version doing accidentaly
>> something stupid as 'kill -9 -1'. It's an open source app at the end.
>> 
>> Now I come with this CAP* proposal to make this app safer.
>> I'm not making your system more secure and not making this app
>> more secure. I can only make your laptop safer for day to day work
>> by limiting the operations this app can do.
>> This particular app monitros the kernel via bpf and tracing.
>> Hence you can give it CAP_TRACING and CAP_BPF and drop the rest.
> 
> This won’t make me much more comfortable, since CAP_BPF lets it do an ever-growing set of nasty things. I’d much rather one or both of two things happen:
> 
> 1. Give it CAP_TRACING only. It can leak my data, but it’s rather hard for it to crash my laptop, lose data, or cause other shenanigans.
> 
> 2. Improve it a bit do all the privileged ops are wrapped by capset().
> 
> Does this make sense?  I’m a security person on occasion. I find vulnerabilities and exploit them deliberately and I break things by accident on a regular basis. In my considered opinion, CAP_TRACING alone, even extended to cover part of BPF as I’ve described, is decently safe. Getting root with just CAP_TRACING will be decently challenging, especially if I don’t get to read things like sshd’s memory, and improvements to mitigate even that could be added.  I am quite confident that attacks starting with CAP_TRACING will have clear audit signatures if auditing is on.  I am also confident that CAP_BPF *will* allow DoS and likely privilege escalation, and this will only get more likely as BPF gets more widely used. And, if BPF-based auditing ever becomes a thing, writing to the audit daemon’s maps will be a great way to cover one’s tracks.
> 
> 
>> I think they have no choice but to do kernel.unprivileged_bpf_disabled=1.
>> We, as a kernel community, are forcing the users into it.
>> Hence I really do not see a value in any proposal today that expands
>> unprivileged bpf usage.
> 
> I think you’re overemphasizing bpf’s role in the whole speculation mess. I realize that you’ve spent an insane amount of time on mitigations to stupid issues. I’ve spent a less insane amount of time on mitigating similar issues outside of bpf.  It’s a mess.  At the end of the day, the kernel does its best, and new bugs show up. New CPUs will be less buggy.

Bah, accidentally hit send.

If the kernel’s mitigations aren’t good enough or you’re subject to direct user attack (e.g. via insufficient IBPB, SMT attack, etc) then you’re vulnerable. Otherwise you’re less vulnerable. BPF is by no means the whole story. Heck, the kernel *could*, at unfortunate performance cost, more aggressively flush around BPF and effectively treat it like user code.

So I think we should design bpf’s API’s security with the philosophy that speculation attacks are just one more type of bug, and we should make sure that real-world useful configurations don’t give BPF to tasks that don’t need it. My unpriv proposal tries to do this. This is *why* my proposal keeps test_run locked down and restricts running each program type to tasks that are explicitly granted the ability to attach it.

So let’s let CAP_TRACING use bpf. Speculation attacks are mostly irrelevant to tracers anyway, but all the rest of the stuff I’ve been talking is relevant.