From: Peter Collingbourne <pcc@google.com> To: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com>, "Eric W. Biederman" <ebiederm@xmission.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, Andy Lutomirski <luto@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Masahiro Yamada <masahiroy@kernel.org>, Sami Tolvanen <samitolvanen@google.com>, YiFei Zhu <yifeifz2@illinois.edu>, Colin Ian King <colin.king@canonical.com>, Mark Rutland <mark.rutland@arm.com>, Frederic Weisbecker <frederic@kernel.org>, Viresh Kumar <viresh.kumar@linaro.org>, Andrey Konovalov <andreyknvl@gmail.com>, Gabriel Krisman Bertazi <krisman@collabora.com>, Balbir Singh <sblbir@amazon.com>, Chris Hyser <chris.hyser@oracle.com>, Daniel Vetter <daniel.vetter@ffwll.ch>, Chris Wilson <chris@chris-wilson.co.uk>, Arnd Bergmann <arnd@arndb.de>, Dmitry Vyukov <dvyukov@google.com>, Christian Brauner <christian.brauner@ubuntu.com>, Alexey Gladkov <legion@kernel.org>, Ran Xiaokai <ran.xiaokai@zte.com.cn>, David Hildenbrand <david@redhat.com>, Xiaofeng Cao <caoxiaofeng@yulong.com>, Cyrill Gorcunov <gorcunov@gmail.com>, Thomas Cedeno <thomascedeno@google.com>, Marco Elver <elver@google.com>, Alexander Potapenko <glider@google.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux ARM <linux-arm-kernel@lists.infradead.org>, Evgenii Stepanov <eugenis@google.com> Subject: Re: [PATCH] kernel: introduce prctl(PR_LOG_UACCESS) Date: Mon, 22 Nov 2021 21:17:54 -0800 [thread overview] Message-ID: <CAMn1gO4mbxb8ukCb5Ne6FM4R4QDQQ+rMtO6TROfKpzFr9gsbTg@mail.gmail.com> (raw) In-Reply-To: <202109251909.B7BB577BA@keescook> On Sat, Sep 25, 2021 at 7:20 PM Kees Cook <keescook@chromium.org> wrote: > > On Fri, Sep 24, 2021 at 02:50:04PM -0700, Peter Collingbourne wrote: > > On Wed, Sep 22, 2021 at 8:59 AM Jann Horn <jannh@google.com> wrote: > > > > > > On Wed, Sep 22, 2021 at 5:30 PM Kees Cook <keescook@chromium.org> wrote: > > > > On Wed, Sep 22, 2021 at 09:23:10AM -0500, Eric W. Biederman wrote: > > > > > Peter Collingbourne <pcc@google.com> writes: > > > > > > This patch introduces a kernel feature known as uaccess logging. > > > > > > [...] > > > > > [...] > > > > > How is logging the kernel's activity like this not a significant > > > > > information leak? How is this safe for unprivileged users? > > > > [...] > > > > Regardless, this is a pretty useful tool for this kind of fuzzing. > > > > Perhaps the timing exposure could be mitigated by having the kernel > > > > collect the record in a separate kernel-allocated buffer and flush the > > > > results to userspace at syscall exit? (This would solve the > > > > copy_to_user() recursion issue too.) > > > > Seems reasonable. I suppose that in terms of timing information we're > > already (unavoidably) exposing how long the syscall took overall, and > > we probably shouldn't deliberately expose more than that. > > Right -- I can't think of anything that can really use this today, > but it very much feels like the kind of information that could aid in > a timing race. Okay, this now goes via a kernel-allocated buffer. > > That being said, I'm wondering if that has security implications on > > its own if it's then possible for userspace to manipulate the kernel > > into allocating a large buffer (either at prctl() time or as a result > > of getting the kernel to do a large number of uaccesses). Perhaps it > > can be mitigated by limiting the size of the uaccess buffer provided > > at prctl() time. > > There are a lot of exact-size allocation controls already (which I think > is an unavoidable but separate issue[1]), but perhaps this could be > mitigated by making the reserved buffer be PAGE_SIZE granular? I was more thinking about userspace causing a kernel OOM or something by making the kernel allocate large buffers. I decided to mitigate it by putting an upper limit on the size of the kernel-side buffer. Since it sounds like exact-size allocations are a pre-existing issue we probably don't need to do anything about them at this time. > > > One aspect that might benefit from some clarification on intended > > > behavior is: what should happen if there are BPF tracing programs > > > running (possibly as part of some kind of system-wide profiling or > > > such) that poke around in userspace memory with BPF's uaccess helpers > > > (especially "bpf_copy_from_user")? > > > > I think we should probably be ignoring those accesses, since we cannot > > know a priori whether the accesses are directly associated with the > > syscall or not, and this is after all a best-effort mechanism. > > Perhaps the "don't log this uaccess" flag I suggested could be > repurposed by BPF too, as a general "make this access invisible to > PR_LOG_UACCESS" flag? i.e. this bit: Since we ended up not needing this flag (because of the kernel-side buffer) I ended up just making BPF use raw_copy_from_user(). > > > > Instead of reimplementing copy_*_user() with a new wrapper that > > > > bypasses some checks and adds others and has to stay in sync, etc, > > > > how about just adding a "recursion" flag? Something like: > > > > > > > > copy_from_user(...) > > > > instrument_copy_from_user(...) > > > > uaccess_buffer_log_read(...) > > > > if (current->uaccess_buffer.writing) > > > > return; > > > > uaccess_buffer_log(...) > > > > current->uaccess_buffer.writing = true; > > > > copy_to_user(...) > > > > current->uaccess_buffer.writing = false; > > > > > > > This would likely only make sense for SECCOMP_RET_TRACE or _TRAP if the > > > > program wants to collect the results after every syscall. And maybe this > > > > won't make any sense across exec (losing the mm that was used during > > > > SECCOMP_SET_UACCESS_TRACE_BUFFER). Hmmm. > > > > > > And then I guess your plan would be that userspace would be expected > > > to use the userspace instruction pointer > > > (seccomp_data::instruction_pointer) to indicate instructions that > > > should be traced? > > That could be one way -- but seccomp filters would allow a bunch of > ways. > > > > > > > Or instead of seccomp, you could do it kinda like > > > https://www.kernel.org/doc/html/latest/admin-guide/syscall-user-dispatch.html > > > , with a prctl that specifies a specific instruction pointer? > > > > Given a choice between these two options, I would prefer the prctl() > > because userspace programs may already be using seccomp filters and > > sanitizers shouldn't interfere with it. > > That's fair -- the "I wish we could make complex decisions about which > syscalls to act on" sounds like seccomp. > > > However, in either the seccomp filter or prctl() case, you still have > > the problem of deciding where to log to. Keep in mind that you would > > need to prevent intervening async signals (that occur between when the > > syscall happens and when we read the log) from triggering additional > > Could the sig handler also set the "make the uaccess invisible" flag? > (It would need to be a "depth" flag, most likely.) It's more complicated than that because you can longjmp() out of a signal handler and that won't necessarily call sigreturn(). The kernel doesn't really have a concept of "depth" as applied to signal handlers, it's all managed on the userspace stack. I brainstormed this with Dmitry a bit out of band and we came up with a nice solution that avoids the two syscalls, is arch-generic and avoids the problem with asynchronous signal handlers. I'll paste a bit from the documentation that I wrote, but please see the full documentation in v2 patch 5/5 for more details. The feature may be used via the following prctl: .. code-block:: c uint64_t addr = 0; /* Generally will be a TLS slot or equivalent */ prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR, &addr, 0, 0, 0); Supplying a non-zero address as the second argument to ``prctl`` will cause the kernel to read an address from that address on each kernel entry (referred to as the *uaccess descriptor address*). When entering the kernel to handle a syscall with a non-zero uaccess descriptor address, the kernel will read a data structure of type ``struct uaccess_descriptor`` from the uaccess descriptor address, which is defined as follows: .. code-block:: c struct uaccess_descriptor { uint64_t addr, size; }; This data structure contains the address and size (in array elements) of a *uaccess buffer*, which is an array of data structures of type ``struct uaccess_buffer_entry``. Before returning to userspace, the kernel will log information about uaccesses to sequential entries in the uaccess buffer. It will also store ``NULL`` to the uaccess descriptor address, and store the address and size of the unused portion of the uaccess buffer to the uaccess descriptor. [...] When entering the kernel for a reason other than a syscall (for example, when IPI'd due to an incoming asynchronous signal) with a non-zero uaccess descriptor address, any signals other than ``SIGKILL`` and ``SIGSTOP`` are masked as if by calling ``sigprocmask(SIG_SETMASK, set, NULL)`` where ``set`` has been initialized with ``sigfillset(set)``. This is to prevent incoming signals from interfering with uaccess logging. Peter
WARNING: multiple messages have this Message-ID (diff)
From: Peter Collingbourne <pcc@google.com> To: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com>, "Eric W. Biederman" <ebiederm@xmission.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, Andy Lutomirski <luto@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Masahiro Yamada <masahiroy@kernel.org>, Sami Tolvanen <samitolvanen@google.com>, YiFei Zhu <yifeifz2@illinois.edu>, Colin Ian King <colin.king@canonical.com>, Mark Rutland <mark.rutland@arm.com>, Frederic Weisbecker <frederic@kernel.org>, Viresh Kumar <viresh.kumar@linaro.org>, Andrey Konovalov <andreyknvl@gmail.com>, Gabriel Krisman Bertazi <krisman@collabora.com>, Balbir Singh <sblbir@amazon.com>, Chris Hyser <chris.hyser@oracle.com>, Daniel Vetter <daniel.vetter@ffwll.ch>, Chris Wilson <chris@chris-wilson.co.uk>, Arnd Bergmann <arnd@arndb.de>, Dmitry Vyukov <dvyukov@google.com>, Christian Brauner <christian.brauner@ubuntu.com>, Alexey Gladkov <legion@kernel.org>, Ran Xiaokai <ran.xiaokai@zte.com.cn>, David Hildenbrand <david@redhat.com>, Xiaofeng Cao <caoxiaofeng@yulong.com>, Cyrill Gorcunov <gorcunov@gmail.com>, Thomas Cedeno <thomascedeno@google.com>, Marco Elver <elver@google.com>, Alexander Potapenko <glider@google.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux ARM <linux-arm-kernel@lists.infradead.org>, Evgenii Stepanov <eugenis@google.com> Subject: Re: [PATCH] kernel: introduce prctl(PR_LOG_UACCESS) Date: Mon, 22 Nov 2021 21:17:54 -0800 [thread overview] Message-ID: <CAMn1gO4mbxb8ukCb5Ne6FM4R4QDQQ+rMtO6TROfKpzFr9gsbTg@mail.gmail.com> (raw) In-Reply-To: <202109251909.B7BB577BA@keescook> On Sat, Sep 25, 2021 at 7:20 PM Kees Cook <keescook@chromium.org> wrote: > > On Fri, Sep 24, 2021 at 02:50:04PM -0700, Peter Collingbourne wrote: > > On Wed, Sep 22, 2021 at 8:59 AM Jann Horn <jannh@google.com> wrote: > > > > > > On Wed, Sep 22, 2021 at 5:30 PM Kees Cook <keescook@chromium.org> wrote: > > > > On Wed, Sep 22, 2021 at 09:23:10AM -0500, Eric W. Biederman wrote: > > > > > Peter Collingbourne <pcc@google.com> writes: > > > > > > This patch introduces a kernel feature known as uaccess logging. > > > > > > [...] > > > > > [...] > > > > > How is logging the kernel's activity like this not a significant > > > > > information leak? How is this safe for unprivileged users? > > > > [...] > > > > Regardless, this is a pretty useful tool for this kind of fuzzing. > > > > Perhaps the timing exposure could be mitigated by having the kernel > > > > collect the record in a separate kernel-allocated buffer and flush the > > > > results to userspace at syscall exit? (This would solve the > > > > copy_to_user() recursion issue too.) > > > > Seems reasonable. I suppose that in terms of timing information we're > > already (unavoidably) exposing how long the syscall took overall, and > > we probably shouldn't deliberately expose more than that. > > Right -- I can't think of anything that can really use this today, > but it very much feels like the kind of information that could aid in > a timing race. Okay, this now goes via a kernel-allocated buffer. > > That being said, I'm wondering if that has security implications on > > its own if it's then possible for userspace to manipulate the kernel > > into allocating a large buffer (either at prctl() time or as a result > > of getting the kernel to do a large number of uaccesses). Perhaps it > > can be mitigated by limiting the size of the uaccess buffer provided > > at prctl() time. > > There are a lot of exact-size allocation controls already (which I think > is an unavoidable but separate issue[1]), but perhaps this could be > mitigated by making the reserved buffer be PAGE_SIZE granular? I was more thinking about userspace causing a kernel OOM or something by making the kernel allocate large buffers. I decided to mitigate it by putting an upper limit on the size of the kernel-side buffer. Since it sounds like exact-size allocations are a pre-existing issue we probably don't need to do anything about them at this time. > > > One aspect that might benefit from some clarification on intended > > > behavior is: what should happen if there are BPF tracing programs > > > running (possibly as part of some kind of system-wide profiling or > > > such) that poke around in userspace memory with BPF's uaccess helpers > > > (especially "bpf_copy_from_user")? > > > > I think we should probably be ignoring those accesses, since we cannot > > know a priori whether the accesses are directly associated with the > > syscall or not, and this is after all a best-effort mechanism. > > Perhaps the "don't log this uaccess" flag I suggested could be > repurposed by BPF too, as a general "make this access invisible to > PR_LOG_UACCESS" flag? i.e. this bit: Since we ended up not needing this flag (because of the kernel-side buffer) I ended up just making BPF use raw_copy_from_user(). > > > > Instead of reimplementing copy_*_user() with a new wrapper that > > > > bypasses some checks and adds others and has to stay in sync, etc, > > > > how about just adding a "recursion" flag? Something like: > > > > > > > > copy_from_user(...) > > > > instrument_copy_from_user(...) > > > > uaccess_buffer_log_read(...) > > > > if (current->uaccess_buffer.writing) > > > > return; > > > > uaccess_buffer_log(...) > > > > current->uaccess_buffer.writing = true; > > > > copy_to_user(...) > > > > current->uaccess_buffer.writing = false; > > > > > > > This would likely only make sense for SECCOMP_RET_TRACE or _TRAP if the > > > > program wants to collect the results after every syscall. And maybe this > > > > won't make any sense across exec (losing the mm that was used during > > > > SECCOMP_SET_UACCESS_TRACE_BUFFER). Hmmm. > > > > > > And then I guess your plan would be that userspace would be expected > > > to use the userspace instruction pointer > > > (seccomp_data::instruction_pointer) to indicate instructions that > > > should be traced? > > That could be one way -- but seccomp filters would allow a bunch of > ways. > > > > > > > Or instead of seccomp, you could do it kinda like > > > https://www.kernel.org/doc/html/latest/admin-guide/syscall-user-dispatch.html > > > , with a prctl that specifies a specific instruction pointer? > > > > Given a choice between these two options, I would prefer the prctl() > > because userspace programs may already be using seccomp filters and > > sanitizers shouldn't interfere with it. > > That's fair -- the "I wish we could make complex decisions about which > syscalls to act on" sounds like seccomp. > > > However, in either the seccomp filter or prctl() case, you still have > > the problem of deciding where to log to. Keep in mind that you would > > need to prevent intervening async signals (that occur between when the > > syscall happens and when we read the log) from triggering additional > > Could the sig handler also set the "make the uaccess invisible" flag? > (It would need to be a "depth" flag, most likely.) It's more complicated than that because you can longjmp() out of a signal handler and that won't necessarily call sigreturn(). The kernel doesn't really have a concept of "depth" as applied to signal handlers, it's all managed on the userspace stack. I brainstormed this with Dmitry a bit out of band and we came up with a nice solution that avoids the two syscalls, is arch-generic and avoids the problem with asynchronous signal handlers. I'll paste a bit from the documentation that I wrote, but please see the full documentation in v2 patch 5/5 for more details. The feature may be used via the following prctl: .. code-block:: c uint64_t addr = 0; /* Generally will be a TLS slot or equivalent */ prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR, &addr, 0, 0, 0); Supplying a non-zero address as the second argument to ``prctl`` will cause the kernel to read an address from that address on each kernel entry (referred to as the *uaccess descriptor address*). When entering the kernel to handle a syscall with a non-zero uaccess descriptor address, the kernel will read a data structure of type ``struct uaccess_descriptor`` from the uaccess descriptor address, which is defined as follows: .. code-block:: c struct uaccess_descriptor { uint64_t addr, size; }; This data structure contains the address and size (in array elements) of a *uaccess buffer*, which is an array of data structures of type ``struct uaccess_buffer_entry``. Before returning to userspace, the kernel will log information about uaccesses to sequential entries in the uaccess buffer. It will also store ``NULL`` to the uaccess descriptor address, and store the address and size of the unused portion of the uaccess buffer to the uaccess descriptor. [...] When entering the kernel for a reason other than a syscall (for example, when IPI'd due to an incoming asynchronous signal) with a non-zero uaccess descriptor address, any signals other than ``SIGKILL`` and ``SIGSTOP`` are masked as if by calling ``sigprocmask(SIG_SETMASK, set, NULL)`` where ``set`` has been initialized with ``sigfillset(set)``. This is to prevent incoming signals from interfering with uaccess logging. Peter _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-11-23 5:18 UTC|newest] Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-22 6:18 [PATCH] kernel: introduce prctl(PR_LOG_UACCESS) Peter Collingbourne 2021-09-22 6:18 ` Peter Collingbourne 2021-09-22 6:30 ` Cyrill Gorcunov 2021-09-22 6:30 ` Cyrill Gorcunov 2021-11-23 5:17 ` Peter Collingbourne 2021-11-23 5:17 ` Peter Collingbourne 2021-09-22 10:44 ` Marco Elver 2021-09-22 10:44 ` Marco Elver 2021-11-23 5:17 ` Peter Collingbourne 2021-11-23 5:17 ` Peter Collingbourne 2021-09-22 13:45 ` Dmitry Vyukov 2021-09-22 13:45 ` Dmitry Vyukov 2021-09-22 22:30 ` Peter Collingbourne 2021-09-22 22:30 ` Peter Collingbourne 2021-09-22 14:23 ` Eric W. Biederman 2021-09-22 14:23 ` Eric W. Biederman 2021-09-22 15:30 ` Kees Cook 2021-09-22 15:30 ` Kees Cook 2021-09-22 15:59 ` Jann Horn 2021-09-22 15:59 ` Jann Horn 2021-09-24 21:50 ` Peter Collingbourne 2021-09-24 21:50 ` Peter Collingbourne 2021-09-26 2:20 ` Kees Cook 2021-09-26 2:20 ` Kees Cook 2021-11-23 5:17 ` Peter Collingbourne [this message] 2021-11-23 5:17 ` Peter Collingbourne 2021-09-22 17:46 ` David Hildenbrand 2021-09-22 17:46 ` David Hildenbrand 2021-09-22 19:22 ` Steven Rostedt 2021-09-22 19:22 ` Steven Rostedt 2021-09-22 19:53 ` Peter Zijlstra 2021-09-22 19:53 ` Peter Zijlstra 2021-09-22 22:05 ` Peter Collingbourne 2021-09-22 22:05 ` Peter Collingbourne 2021-09-23 8:08 ` David Hildenbrand 2021-09-23 8:08 ` David Hildenbrand
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAMn1gO4mbxb8ukCb5Ne6FM4R4QDQQ+rMtO6TROfKpzFr9gsbTg@mail.gmail.com \ --to=pcc@google.com \ --cc=akpm@linux-foundation.org \ --cc=andreyknvl@gmail.com \ --cc=arnd@arndb.de \ --cc=bristot@redhat.com \ --cc=bsegall@google.com \ --cc=caoxiaofeng@yulong.com \ --cc=catalin.marinas@arm.com \ --cc=chris.hyser@oracle.com \ --cc=chris@chris-wilson.co.uk \ --cc=christian.brauner@ubuntu.com \ --cc=colin.king@canonical.com \ --cc=daniel.vetter@ffwll.ch \ --cc=david@redhat.com \ --cc=dietmar.eggemann@arm.com \ --cc=dvyukov@google.com \ --cc=ebiederm@xmission.com \ --cc=elver@google.com \ --cc=eugenis@google.com \ --cc=frederic@kernel.org \ --cc=glider@google.com \ --cc=gorcunov@gmail.com \ --cc=jannh@google.com \ --cc=juri.lelli@redhat.com \ --cc=keescook@chromium.org \ --cc=krisman@collabora.com \ --cc=legion@kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=luto@kernel.org \ --cc=mark.rutland@arm.com \ --cc=masahiroy@kernel.org \ --cc=mgorman@suse.de \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=ran.xiaokai@zte.com.cn \ --cc=rostedt@goodmis.org \ --cc=samitolvanen@google.com \ --cc=sblbir@amazon.com \ --cc=tglx@linutronix.de \ --cc=thomascedeno@google.com \ --cc=vincent.guittot@linaro.org \ --cc=viresh.kumar@linaro.org \ --cc=will@kernel.org \ --cc=yifeifz2@illinois.edu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.