From: Daniel Borkmann <daniel@iogearbox.net> To: Paul Moore <paul@paul-moore.com> Cc: Ondrej Mosnacek <omosnace@redhat.com>, linux-security-module@vger.kernel.org, James Morris <jmorris@namei.org>, Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@redhat.com>, Stephen Smalley <stephen.smalley.work@gmail.com>, selinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Casey Schaufler <casey@schaufler-ca.com>, jolsa@redhat.com Subject: Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks Date: Fri, 28 May 2021 20:10:34 +0200 [thread overview] Message-ID: <c7c2d7e1-e253-dce0-d35c-392192e4926e@iogearbox.net> (raw) In-Reply-To: <CAHC9VhR-kYmMA8gsqkiL5=poN9FoL-uCyx1YOLCoG2hRiUBYug@mail.gmail.com> On 5/28/21 5:47 PM, Paul Moore wrote: > On Fri, May 28, 2021 at 3:10 AM Daniel Borkmann <daniel@iogearbox.net> wrote: >> On 5/28/21 3:37 AM, Paul Moore wrote: >>> On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: >>>> >>>> Commit 59438b46471a ("security,lockdown,selinux: implement SELinux >>>> lockdown") added an implementation of the locked_down LSM hook to >>>> SELinux, with the aim to restrict which domains are allowed to perform >>>> operations that would breach lockdown. >>>> >>>> However, in several places the security_locked_down() hook is called in >>>> situations where the current task isn't doing any action that would >>>> directly breach lockdown, leading to SELinux checks that are basically >>>> bogus. >>>> >>>> Since in most of these situations converting the callers such that >>>> security_locked_down() is called in a context where the current task >>>> would be meaningful for SELinux is impossible or very non-trivial (and >>>> could lead to TOCTOU issues for the classic Lockdown LSM >>>> implementation), fix this by modifying the hook to accept a struct cred >>>> pointer as argument, where NULL will be interpreted as a request for a >>>> "global", task-independent lockdown decision only. Then modify SELinux >>>> to ignore calls with cred == NULL. >>> >>> I'm not overly excited about skipping the access check when cred is >>> NULL. Based on the description and the little bit that I've dug into >>> thus far it looks like using SECINITSID_KERNEL as the subject would be >>> much more appropriate. *Something* (the kernel in most of the >>> relevant cases it looks like) is requesting that a potentially >>> sensitive disclosure be made, and ignoring it seems like the wrong >>> thing to do. Leaving the access control intact also provides a nice >>> avenue to audit these requests should users want to do that. >> >> I think the rationale/workaround for ignoring calls with cred == NULL (or the previous >> patch with the unimplemented hook) from Ondrej was two-fold, at least speaking for his >> seen tracing cases: >> >> i) The audit events that are triggered due to calls to security_locked_down() >> can OOM kill a machine, see below details [0]. >> >> ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end() >> when presumingly trying to wake up kauditd [1]. >> >> How would your suggestion above solve both i) and ii)? > > First off, a bit of general commentary - I'm not sure if Ondrej was > aware of this, but info like that is good to have in the commit > description. Perhaps it was in the linked RHBZ but I try not to look > at those when reviewing patches; the commit descriptions must be > self-sufficient since we can't rely on the accessibility or the > lifetime of external references. It's fine if people want to include > external links in their commits, I would actually even encourage it in > some cases, but the links shouldn't replace a proper description of > the problem and why the proposed solution is The Best Solution. > > With that out of the way, it sounds like your issue isn't so much the > access check, but rather the frequency of the access denials and the > resulting audit records in your particular use case. My initial > reaction is that you might want to understand why you are getting so > many SELinux access denials, your loaded security policy clearly does > not match with your intended use :) Beyond that, if you want to > basically leave things as-is but quiet the high frequency audit > records that result from these SELinux denials you might want to look > into the SELinux "dontaudit" policy rule, it was created for things > like this. Some info can be found in The SELinux Notebook, relevant > link below: > > * https://github.com/SELinuxProject/selinux-notebook/blob/main/src/avc_rules.md#dontaudit > > The deadlock issue that was previously reported remains an open case > as far as I'm concerned; I'm presently occupied trying to sort out a > rather serious issue with respect to io_uring and LSM/audit (plus > general stuff at $DAYJOB) so I haven't had time to investigate this > any further. Of course anyone else is welcome to dive into it (I > always want to encourage this, especially from "performance people" > who just want to shut it all off), however if the answer is basically > "disable LSM and/or audit checks" you have to know that it is going to > result in a high degree of skepticism from me, so heavy documentation > on why it is The Best Solution would be a very good thing :) Beyond > that, I think the suggestions above of "why do you have so many policy > denials?" and "have you looked into dontaudit?" are solid places to > look for a solution in your particular case. > >>>> Since most callers will just want to pass current_cred() as the cred >>>> parameter, rename the hook to security_cred_locked_down() and provide >>>> the original security_locked_down() function as a simple wrapper around >>>> the new hook. >> >> [...] >>> >>>> 3. kernel/trace/bpf_trace.c:bpf_probe_read_kernel{,_str}_common() >>>> Called when a BPF program calls a helper that could leak kernel >>>> memory. The task context is not relevant here, since the program >>>> may very well be run in the context of a different task than the >>>> consumer of the data. >>>> See: https://bugzilla.redhat.com/show_bug.cgi?id=1955585 >>> >>> The access control check isn't so much who is consuming the data, but >>> who is requesting a potential violation of a "lockdown", yes? For >>> example, the SELinux policy rule for the current lockdown check looks >>> something like this: >>> >>> allow <who> <who> : lockdown { <reason> }; >>> >>> It seems to me that the task context is relevant here and performing >>> the access control check based on the task's domain is correct. >> >> This doesn't make much sense to me, it's /not/ the task 'requesting a potential >> violation of a "lockdown"', but rather the running tracing program which is e.g. >> inspecting kernel data structures around the triggered event. If I understood >> you correctly, having an 'allow' check on, say, httpd would be rather odd since >> things like perf/bcc/bpftrace/systemtap/etc is installing the tracing probe instead. >> >> Meaning, if we would /not/ trace such events (like in the prior mentioned syscall >> example), then there is also no call to the security_locked_down() from that same/ >> unmodified application. > > My turn to say that you don't make much sense to me :) > > Let's reset. Sure, yep, lets shortly take one step back. :) > What task_struct is running the BPF tracing program which is calling > into security_locked_down()? My current feeling is that it is this > context/domain/cred that should be used for the access control check; > in the cases where it is a kernel thread, I think passing NULL is > reasonable, but I think the proper thing for SELinux is to interpret > NULL as kernel_t. If this was a typical LSM hook and, say, your app calls into bind(2) where we then invoke security_socket_bind() and check 'current' task, then I'm all with you, because this was _explicitly initiated_ by the httpd app, so that allow/deny policy belongs in the context of httpd. In the case of tracing, it's different. You install small programs that are triggered when certain events fire. Random example from bpftrace's README [0], you want to generate a histogram of syscall counts by program. One-liner is: bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' bpftrace then goes and generates a BPF prog from this internally. One way of doing it could be to call bpf_get_current_task() helper and then access current->comm via one of bpf_probe_read_kernel{,_str}() helpers. So the program itself has nothing to do with httpd or any other random app doing a syscall here. The BPF prog _explicitly initiated_ the lockdown check. The allow/deny policy belongs in the context of bpftrace: meaning, you want to grant bpftrace access to use these helpers, but other tracers on the systems like my_random_tracer not. While this works for prior mentioned cases of security_locked_down() with open_kcore() for /proc/kcore access or the module_sig_check(), it is broken for tracing as-is, and the patch I sent earlier fixes this. Thanks, Daniel [0] https://github.com/iovisor/bpftrace
WARNING: multiple messages have this Message-ID (diff)
From: Daniel Borkmann <daniel@iogearbox.net> To: Paul Moore <paul@paul-moore.com> Cc: jolsa@redhat.com, selinux@vger.kernel.org, netdev@vger.kernel.org, Stephen Smalley <stephen.smalley.work@gmail.com>, Ondrej Mosnacek <omosnace@redhat.com>, Steven Rostedt <rostedt@goodmis.org>, James Morris <jmorris@namei.org>, Casey Schaufler <casey@schaufler-ca.com>, linux-security-module@vger.kernel.org, Ingo Molnar <mingo@redhat.com>, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks Date: Fri, 28 May 2021 20:10:34 +0200 [thread overview] Message-ID: <c7c2d7e1-e253-dce0-d35c-392192e4926e@iogearbox.net> (raw) In-Reply-To: <CAHC9VhR-kYmMA8gsqkiL5=poN9FoL-uCyx1YOLCoG2hRiUBYug@mail.gmail.com> On 5/28/21 5:47 PM, Paul Moore wrote: > On Fri, May 28, 2021 at 3:10 AM Daniel Borkmann <daniel@iogearbox.net> wrote: >> On 5/28/21 3:37 AM, Paul Moore wrote: >>> On Mon, May 17, 2021 at 5:22 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: >>>> >>>> Commit 59438b46471a ("security,lockdown,selinux: implement SELinux >>>> lockdown") added an implementation of the locked_down LSM hook to >>>> SELinux, with the aim to restrict which domains are allowed to perform >>>> operations that would breach lockdown. >>>> >>>> However, in several places the security_locked_down() hook is called in >>>> situations where the current task isn't doing any action that would >>>> directly breach lockdown, leading to SELinux checks that are basically >>>> bogus. >>>> >>>> Since in most of these situations converting the callers such that >>>> security_locked_down() is called in a context where the current task >>>> would be meaningful for SELinux is impossible or very non-trivial (and >>>> could lead to TOCTOU issues for the classic Lockdown LSM >>>> implementation), fix this by modifying the hook to accept a struct cred >>>> pointer as argument, where NULL will be interpreted as a request for a >>>> "global", task-independent lockdown decision only. Then modify SELinux >>>> to ignore calls with cred == NULL. >>> >>> I'm not overly excited about skipping the access check when cred is >>> NULL. Based on the description and the little bit that I've dug into >>> thus far it looks like using SECINITSID_KERNEL as the subject would be >>> much more appropriate. *Something* (the kernel in most of the >>> relevant cases it looks like) is requesting that a potentially >>> sensitive disclosure be made, and ignoring it seems like the wrong >>> thing to do. Leaving the access control intact also provides a nice >>> avenue to audit these requests should users want to do that. >> >> I think the rationale/workaround for ignoring calls with cred == NULL (or the previous >> patch with the unimplemented hook) from Ondrej was two-fold, at least speaking for his >> seen tracing cases: >> >> i) The audit events that are triggered due to calls to security_locked_down() >> can OOM kill a machine, see below details [0]. >> >> ii) It seems to be causing a deadlock via slow_avc_audit() -> audit_log_end() >> when presumingly trying to wake up kauditd [1]. >> >> How would your suggestion above solve both i) and ii)? > > First off, a bit of general commentary - I'm not sure if Ondrej was > aware of this, but info like that is good to have in the commit > description. Perhaps it was in the linked RHBZ but I try not to look > at those when reviewing patches; the commit descriptions must be > self-sufficient since we can't rely on the accessibility or the > lifetime of external references. It's fine if people want to include > external links in their commits, I would actually even encourage it in > some cases, but the links shouldn't replace a proper description of > the problem and why the proposed solution is The Best Solution. > > With that out of the way, it sounds like your issue isn't so much the > access check, but rather the frequency of the access denials and the > resulting audit records in your particular use case. My initial > reaction is that you might want to understand why you are getting so > many SELinux access denials, your loaded security policy clearly does > not match with your intended use :) Beyond that, if you want to > basically leave things as-is but quiet the high frequency audit > records that result from these SELinux denials you might want to look > into the SELinux "dontaudit" policy rule, it was created for things > like this. Some info can be found in The SELinux Notebook, relevant > link below: > > * https://github.com/SELinuxProject/selinux-notebook/blob/main/src/avc_rules.md#dontaudit > > The deadlock issue that was previously reported remains an open case > as far as I'm concerned; I'm presently occupied trying to sort out a > rather serious issue with respect to io_uring and LSM/audit (plus > general stuff at $DAYJOB) so I haven't had time to investigate this > any further. Of course anyone else is welcome to dive into it (I > always want to encourage this, especially from "performance people" > who just want to shut it all off), however if the answer is basically > "disable LSM and/or audit checks" you have to know that it is going to > result in a high degree of skepticism from me, so heavy documentation > on why it is The Best Solution would be a very good thing :) Beyond > that, I think the suggestions above of "why do you have so many policy > denials?" and "have you looked into dontaudit?" are solid places to > look for a solution in your particular case. > >>>> Since most callers will just want to pass current_cred() as the cred >>>> parameter, rename the hook to security_cred_locked_down() and provide >>>> the original security_locked_down() function as a simple wrapper around >>>> the new hook. >> >> [...] >>> >>>> 3. kernel/trace/bpf_trace.c:bpf_probe_read_kernel{,_str}_common() >>>> Called when a BPF program calls a helper that could leak kernel >>>> memory. The task context is not relevant here, since the program >>>> may very well be run in the context of a different task than the >>>> consumer of the data. >>>> See: https://bugzilla.redhat.com/show_bug.cgi?id=1955585 >>> >>> The access control check isn't so much who is consuming the data, but >>> who is requesting a potential violation of a "lockdown", yes? For >>> example, the SELinux policy rule for the current lockdown check looks >>> something like this: >>> >>> allow <who> <who> : lockdown { <reason> }; >>> >>> It seems to me that the task context is relevant here and performing >>> the access control check based on the task's domain is correct. >> >> This doesn't make much sense to me, it's /not/ the task 'requesting a potential >> violation of a "lockdown"', but rather the running tracing program which is e.g. >> inspecting kernel data structures around the triggered event. If I understood >> you correctly, having an 'allow' check on, say, httpd would be rather odd since >> things like perf/bcc/bpftrace/systemtap/etc is installing the tracing probe instead. >> >> Meaning, if we would /not/ trace such events (like in the prior mentioned syscall >> example), then there is also no call to the security_locked_down() from that same/ >> unmodified application. > > My turn to say that you don't make much sense to me :) > > Let's reset. Sure, yep, lets shortly take one step back. :) > What task_struct is running the BPF tracing program which is calling > into security_locked_down()? My current feeling is that it is this > context/domain/cred that should be used for the access control check; > in the cases where it is a kernel thread, I think passing NULL is > reasonable, but I think the proper thing for SELinux is to interpret > NULL as kernel_t. If this was a typical LSM hook and, say, your app calls into bind(2) where we then invoke security_socket_bind() and check 'current' task, then I'm all with you, because this was _explicitly initiated_ by the httpd app, so that allow/deny policy belongs in the context of httpd. In the case of tracing, it's different. You install small programs that are triggered when certain events fire. Random example from bpftrace's README [0], you want to generate a histogram of syscall counts by program. One-liner is: bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' bpftrace then goes and generates a BPF prog from this internally. One way of doing it could be to call bpf_get_current_task() helper and then access current->comm via one of bpf_probe_read_kernel{,_str}() helpers. So the program itself has nothing to do with httpd or any other random app doing a syscall here. The BPF prog _explicitly initiated_ the lockdown check. The allow/deny policy belongs in the context of bpftrace: meaning, you want to grant bpftrace access to use these helpers, but other tracers on the systems like my_random_tracer not. While this works for prior mentioned cases of security_locked_down() with open_kcore() for /proc/kcore access or the module_sig_check(), it is broken for tracing as-is, and the patch I sent earlier fixes this. Thanks, Daniel [0] https://github.com/iovisor/bpftrace
next prev parent reply other threads:[~2021-05-28 18:28 UTC|newest] Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-17 9:20 [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks Ondrej Mosnacek 2021-05-17 9:20 ` [PATCH v2] lockdown, selinux: " Ondrej Mosnacek 2021-05-17 11:00 ` [PATCH v2] lockdown,selinux: " Michael Ellerman 2021-05-17 11:00 ` Michael Ellerman 2021-05-26 11:44 ` Ondrej Mosnacek 2021-05-26 11:44 ` Ondrej Mosnacek 2021-05-27 4:28 ` James Morris 2021-05-27 4:28 ` James Morris 2021-05-27 14:18 ` Paul Moore 2021-05-27 14:18 ` Paul Moore 2021-05-28 1:37 ` Paul Moore 2021-05-28 1:37 ` Paul Moore 2021-05-28 7:09 ` Daniel Borkmann 2021-05-28 7:09 ` Daniel Borkmann 2021-05-28 9:53 ` Jiri Olsa 2021-05-28 9:53 ` Jiri Olsa 2021-05-28 9:56 ` Daniel Borkmann 2021-05-28 9:56 ` Daniel Borkmann 2021-05-28 10:16 ` Jiri Olsa 2021-05-28 10:16 ` Jiri Olsa 2021-05-28 11:47 ` Jiri Olsa 2021-05-28 11:47 ` Jiri Olsa 2021-05-28 11:54 ` Daniel Borkmann 2021-05-28 11:54 ` Daniel Borkmann 2021-05-28 13:42 ` Ondrej Mosnacek 2021-05-28 13:42 ` Ondrej Mosnacek 2021-05-28 14:20 ` Daniel Borkmann 2021-05-28 14:20 ` Daniel Borkmann 2021-05-28 15:54 ` Paul Moore 2021-05-28 15:54 ` Paul Moore 2021-05-28 15:47 ` Paul Moore 2021-05-28 15:47 ` Paul Moore 2021-05-28 18:10 ` Daniel Borkmann [this message] 2021-05-28 18:10 ` Daniel Borkmann 2021-05-28 22:52 ` Paul Moore 2021-05-28 22:52 ` Paul Moore 2021-05-29 18:48 ` Paul Moore 2021-05-29 18:48 ` Paul Moore 2021-05-31 8:24 ` Daniel Borkmann 2021-05-31 8:24 ` Daniel Borkmann 2021-06-01 20:47 ` Paul Moore 2021-06-01 20:47 ` Paul Moore 2021-06-02 12:40 ` Daniel Borkmann 2021-06-02 12:40 ` Daniel Borkmann 2021-06-02 15:13 ` Paul Moore 2021-06-02 15:13 ` Paul Moore 2021-06-03 18:52 ` Daniel Borkmann 2021-06-03 18:52 ` Daniel Borkmann 2021-06-04 4:50 ` Paul Moore 2021-06-04 4:50 ` Paul Moore 2021-06-04 18:02 ` Daniel Borkmann 2021-06-04 18:02 ` Daniel Borkmann 2021-06-04 23:34 ` Paul Moore 2021-06-04 23:34 ` Paul Moore 2021-06-05 0:08 ` Alexei Starovoitov 2021-06-05 0:08 ` Alexei Starovoitov 2021-06-05 18:10 ` Casey Schaufler 2021-06-05 18:10 ` Casey Schaufler 2021-06-05 18:17 ` Linus Torvalds 2021-06-05 18:17 ` Linus Torvalds 2021-06-06 2:11 ` Paul Moore 2021-06-06 2:11 ` Paul Moore 2021-06-06 1:30 ` Paul Moore 2021-06-06 1:30 ` Paul Moore 2021-06-02 13:39 ` Ondrej Mosnacek 2021-06-02 13:39 ` Ondrej Mosnacek 2021-06-03 17:46 ` Paul Moore 2021-06-03 17:46 ` Paul Moore 2021-06-08 11:01 ` Ondrej Mosnacek 2021-06-08 11:01 ` Ondrej Mosnacek 2021-06-09 2:40 ` Paul Moore 2021-06-09 2:40 ` Paul Moore 2021-05-28 13:58 ` Steven Rostedt 2021-05-28 13:58 ` Steven Rostedt
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=c7c2d7e1-e253-dce0-d35c-392192e4926e@iogearbox.net \ --to=daniel@iogearbox.net \ --cc=bpf@vger.kernel.org \ --cc=casey@schaufler-ca.com \ --cc=jmorris@namei.org \ --cc=jolsa@redhat.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-security-module@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mingo@redhat.com \ --cc=netdev@vger.kernel.org \ --cc=omosnace@redhat.com \ --cc=paul@paul-moore.com \ --cc=rostedt@goodmis.org \ --cc=selinux@vger.kernel.org \ --cc=stephen.smalley.work@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.