From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 939BE182C3 for ; Fri, 15 Mar 2024 20:16:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710533785; cv=none; b=A6u+tOkfpehpesZo+NuUDtYpf29tvvChJ5IWAxBqD2fQlCk2KHV+gsccwvHKMl2I/P8NARNpILygofYWPRI6wdf0dz2WoXHtRp4r5f3L3dDPaQx0TbL/rvkeWvQxua7wItswKQT1EaD+o2Q8EH+yT/lceOEL10bI054eFE5vTRI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710533785; c=relaxed/simple; bh=mbXLrkQrqSVUsMXrMuRA0u/MxjY9DterNR4RqBJiMDY=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=PgGlGYzW6EAAyl1oP6TlTVlx/lEXsBwBvWd+2x2sL6mD5yUA5ro1ira5Y6Upe2ZgzWRKpZsZNO/FNVkwg6USe6kkae2R9uz6DIKVg2cuO/1tcOw5elOxp3nViJTbeh4AnxiAOuMaHiv54qfqtLl6anVaxbfeww74L8WFCxNFHbk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=Lc/aNOsS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="Lc/aNOsS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A4D10C433C7; Fri, 15 Mar 2024 20:16:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1710533785; bh=mbXLrkQrqSVUsMXrMuRA0u/MxjY9DterNR4RqBJiMDY=; h=From:To:Cc:Subject:Date:Reply-to:From; b=Lc/aNOsSHBVxROW1e1+pj8b1WN8d3zUo98yTtlc7ZL12RiC2q36IB3aMLsamjATOD AzR1I4EMLAF8MiSfOg/6CMLnEHINTcCFd7J1uPO/C49LovnOi+RUyDItu1poUtJa5t dD/zp4mvbxR1DMxQFbgQ2QRSCh8TssKc9ZjIMS0s= From: Greg Kroah-Hartman To: linux-cve-announce@vger.kernel.org Cc: Greg Kroah-Hartman Subject: CVE-2021-47128: bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks Date: Fri, 15 Mar 2024 21:15:18 +0100 Message-ID: <2024031512-CVE-2021-47128-bef7@gregkh> X-Mailer: git-send-email 2.44.0 Precedence: bulk X-Mailing-List: linux-cve-announce@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Reply-to: , X-Developer-Signature: v=1; a=openpgp-sha256; l=10211; i=gregkh@linuxfoundation.org; h=from:subject:message-id; bh=mbXLrkQrqSVUsMXrMuRA0u/MxjY9DterNR4RqBJiMDY=; b=owGbwMvMwCRo6H6F97bub03G02pJDKlf1gSytJ6s+LBeZCO7vMF53l2TGaI3np+8V08tQ3a99 glL1yfLOmJZGASZGGTFFFm+bOM5ur/ikKKXoe1pmDmsTCBDGLg4BeAiPxnmZx/4eZfpbGH7et1J ++Mer1nbXyL/kmEOZ+lerqDckhdzGxxL4vsunfD6WFcPAA== X-Developer-Key: i=gregkh@linuxfoundation.org; a=openpgp; fpr=F4B60CC5BF78C2214A313DCB3147D40DDB2DFB29 Content-Transfer-Encoding: 8bit Description =========== In the Linux kernel, the following vulnerability has been resolved: bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. This is indirectly also getting audit subsystem involved to report events. The latter is problematic, as reported by Ondrej and Serhei, since it can bring down the whole system via audit: 1) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. 2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit() when trying to wake up kauditd, for example, when using trace_sched_switch() tracepoint, see details in [1]. Triggering this was not via some hypothetical corner case, but with existing tools like runqlat & runqslower from bcc, for example, which make use of this tracepoint. Rough call sequence goes like: rq_lock(rq) -> -------------------------+ trace_sched_switch() -> | bpf_prog_xyz() -> +-> deadlock selinux_lockdown() -> | audit_log_end() -> | wake_up_interruptible() -> | try_to_wake_up() -> | rq_lock(rq) --------------+ What's worse is that the intention of 59438b46471a to further restrict lockdown settings for specific applications in respect to the global lockdown policy is completely broken for BPF. The SELinux policy rule for the current lockdown check looks something like this: allow : lockdown { }; However, this doesn't match with the 'current' task where the security_locked_down() is executed, example: httpd does a syscall. There is a tracing program attached to the syscall which triggers a BPF program to run, which ends up doing a bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does the permission check against 'current', that is, httpd in this example. httpd has literally zero relation to this tracing program, and it would be nonsensical having to write an SELinux policy rule against httpd to let the tracing helper pass. The policy in this case needs to be against the entity that is installing the BPF program. For example, if bpftrace would generate a histogram of syscall counts by user space application: bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' bpftrace would then go and generate a BPF program from this internally. One way of doing it [for the sake of the example] could be to call bpf_get_current_task() helper and then access current->comm via one of bpf_probe_read_kernel{,_str}() helpers. So the program itself has nothing to do with httpd or any other random app doing a syscall here. The BPF program _explicitly initiated_ the lockdown check. The allow/deny policy belongs in the context of bpftrace: meaning, you want to grant bpftrace access to use these helpers, but other tracers on the system like my_random_tracer _not_. Therefore fix all three issues at the same time by taking a completely different approach for the security_locked_down() hook, that is, move the check into the program verification phase where we actually retrieve the BPF func proto. This also reliably gets the task (current) that is trying to install the BPF tracing program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since we're moving this out of the BPF helper's fast-path which can be called several millions of times per second. The check is then also in line with other security_locked_down() hooks in the system where the enforcement is performed at open/load time, for example, open_kcore() for /proc/kcore access or module_sig_check() for module signatures just to pick few random ones. What's out of scope in the fix as well as in other security_locked_down() hook locations /outside/ of BPF subsystem is that if the lockdown policy changes on the fly there is no retrospective action. This requires a different discussion, potentially complex infrastructure, and it's also not clear whether this can be solved generically. Either way, it is out of scope for a suitable stable fix which this one is targeting. Note that the breakage is specifically on 59438b46471a where it started to rely on 'current' as UAPI behavior, and _not_ earlier infrastructure such as 9d1f8be5cf42 ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode"). [0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says: I starting seeing this with F-34. When I run a container that is traced with BPF to record the syscalls it is doing, auditd is flooded with messages like: type=AVC msg=audit(1619784520.593:282387): avc: denied { confidentiality } for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM" scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0 tclass=lockdown permissive=0 This seems to be leading to auditd running out of space in the backlog buffer and eventually OOMs the machine. [...] auditd running at 99% CPU presumably processing all the messages, eventually I get: Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64 Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000 Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1 Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [...] [1] https://lore.kernel.org/linux-audit/CANYvDQN7H5tVp47fbYcRasv4XF07eUbsDwT_eDCHXJUj43J7jQ@mail.gmail.com/, Serhei Makarov says: Upstream kernel 5.11.0-rc7 and later was found to deadlock during a bpf_probe_read_compat() call within a sched_switch tracepoint. The problem is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend testsuite on x86_64 as well as the runqlat, runqslower tools from bcc on ppc64le. Example stack trace: [...] [ 730.868702] stack backtrace: [ 730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1 [ 730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 730.873278] Call Trace: [ 730.873770] dump_stack+0x7f/0xa1 [ 730.874433] check_noncircular+0xdf/0x100 [ 730.875232] __lock_acquire+0x1202/0x1e10 [ 730.876031] ? __lock_acquire+0xfc0/0x1e10 [ 730.876844] lock_acquire+0xc2/0x3a0 [ 730.877551] ? __wake_up_common_lock+0x52/0x90 [ 730.878434] ? lock_acquire+0xc2/0x3a0 [ 730.879186] ? lock_is_held_type+0xa7/0x120 [ 730.880044] ? skb_queue_tail+0x1b/0x50 [ 730.880800] _raw_spin_lock_irqsave+0x4d/0x90 [ 730.881656] ? __wake_up_common_lock+0x52/0x90 [ 730.882532] __wake_up_common_lock+0x52/0x90 [ 730.883375] audit_log_end+0x5b/0x100 [ 730.884104] slow_avc_audit+0x69/0x90 [ 730.884836] avc_has_perm+0x8b/0xb0 [ 730.885532] selinux_lockdown+0xa5/0xd0 [ 730.886297] security_locked_down+0x20/0x40 [ 730.887133] bpf_probe_read_compat+0x66/0xd0 [ 730.887983] bpf_prog_250599c5469ac7b5+0x10f/0x820 [ 730.888917] trace_call_bpf+0xe9/0x240 [ 730.889672] perf_trace_run_bpf_submit+0x4d/0xc0 [ 730.890579] perf_trace_sched_switch+0x142/0x180 [ 730.891485] ? __schedule+0x6d8/0xb20 [ 730.892209] __schedule+0x6d8/0xb20 [ 730.892899] schedule+0x5b/0xc0 [ 730.893522] exit_to_user_mode_prepare+0x11d/0x240 [ 730.894457] syscall_exit_to_user_mode+0x27/0x70 [ 730.895361] entry_SYSCALL_64_after_hwframe+0x44/0xae [...] The Linux kernel CVE team has assigned CVE-2021-47128 to this issue. Affected and fixed versions =========================== Issue introduced in 5.6 with commit 59438b46471a and fixed in 5.10.43 with commit ff5039ec75c8 Issue introduced in 5.6 with commit 59438b46471a and fixed in 5.12.10 with commit acc43fc6cf0d Issue introduced in 5.6 with commit 59438b46471a and fixed in 5.13 with commit ff40e51043af Please see https://www.kernel.org or a full list of currently supported kernel versions by the kernel community. Unaffected versions might change over time as fixes are backported to older supported kernel versions. The official CVE entry at https://cve.org/CVERecord/?id=CVE-2021-47128 will be updated if fixes are backported, please check that for the most up to date information about this issue. Affected files ============== The file(s) affected by this issue are: kernel/bpf/helpers.c kernel/trace/bpf_trace.c Mitigation ========== The Linux kernel CVE team recommends that you update to the latest stable kernel version for this, and many other bugfixes. Individual changes are never tested alone, but rather are part of a larger kernel release. Cherry-picking individual commits is not recommended or supported by the Linux kernel community at all. If however, updating to the latest release is impossible, the individual changes to resolve this issue can be found at these commits: https://git.kernel.org/stable/c/ff5039ec75c83d2ed5b781dc7733420ee8c985fc https://git.kernel.org/stable/c/acc43fc6cf0d50612193813c5906a1ab9d433e1e https://git.kernel.org/stable/c/ff40e51043af63715ab413995ff46996ecf9583f