stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}()
@ 2021-04-21 13:05 Zidenberg, Tsahi
  2021-04-21 13:07 ` [PATCH 1/8] uaccess: Add strict non-pagefault kernel-space read, function Zidenberg, Tsahi
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 13:05 UTC (permalink / raw)
  To: stable; +Cc: Greg KH

In arm64, kernelspace address accessors cannot be used to access
userspace addresses, which means bpf_probe_read{,str}() cannot access
userspace addresses. That causes e.g. command-line parameters to not
appear when snooping execve using bpf.

This patch series takes the upstream solution. This solution also
changes user API in the following ways:
* Add probe_read_{user, kernel}{,_str} bpf helpers
* Add skb_output helper to the enum only (calling it not supported)
* Add support for %pks, %pus specifiers

An alternative fix only takes the required logic to existing API without
adding new API, was suggested here:
https://www.spinics.net/lists/stable/msg454945.html

Another option is to only take patches [1-4] of this patchset, and add
on top of them commit 8d92db5c04d1 ("bpf: rework the compat kernel probe
handling"). In that case, the last patch would require function renames
and conflict resolutions that were avoided in this patchset by pulling
patches [5-7].

Christoph Hellwig (3):
  maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault
  maccess: rename strncpy_from_unsafe_strict to
    strncpy_from_kernel_nofault
  bpf: rework the compat kernel probe handling

Daniel Borkmann (4):
  uaccess: Add strict non-pagefault kernel-space read function
  bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str
    helpers
  bpf: Restrict bpf_probe_read{, str}() only to archs where they work
  bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus
    specifier

Petr Mladek (1):
  powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc again

 Documentation/core-api/printk-formats.rst |  14 +
 arch/arm/Kconfig                          |   1 +
 arch/arm64/Kconfig                        |   1 +
 arch/powerpc/Kconfig                      |   1 +
 arch/x86/Kconfig                          |   1 +
 arch/x86/mm/Makefile                      |   2 +-
 arch/x86/mm/maccess.c                     |  43 +++
 include/linux/uaccess.h                   |   8 +-
 include/uapi/linux/bpf.h                  | 123 ++++++---
 init/Kconfig                              |   3 +
 kernel/trace/bpf_trace.c                  | 302 ++++++++++++++++------
 kernel/trace/trace_kprobe.c               |   2 +-
 lib/vsprintf.c                            |  12 +
 mm/maccess.c                              |  48 +++-
 tools/include/uapi/linux/bpf.h            | 116 ++++++---
 15 files changed, 512 insertions(+), 165 deletions(-)
 create mode 100644 arch/x86/mm/maccess.c

-- 
2.25.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/8] uaccess: Add strict non-pagefault kernel-space read, function
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
@ 2021-04-21 13:07 ` Zidenberg, Tsahi
  2021-04-21 13:08 ` bpf: Add probe_read_{user, kernel} and probe_read_{user,, kernel}_str helpers Zidenberg, Tsahi
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 13:07 UTC (permalink / raw)
  To: stable; +Cc: Greg KH

commit 75a1a607bb7e6d918be3aca11ec2214a275392f4 upstream

Add two new probe_kernel_read_strict() and strncpy_from_unsafe_strict()
helpers which by default alias to the __probe_kernel_read() and the
__strncpy_from_unsafe(), respectively, but can be overridden by archs
which have non-overlapping address ranges for kernel space and user
space in order to bail out with -EFAULT when attempting to probe user
memory including non-canonical user access addresses [0]:

  4-level page tables:
    user-space mem: 0x0000000000000000 - 0x00007fffffffffff
    non-canonical:  0x0000800000000000 - 0xffff7fffffffffff

  5-level page tables:
    user-space mem: 0x0000000000000000 - 0x00ffffffffffffff
    non-canonical:  0x0100000000000000 - 0xfeffffffffffffff

The idea is that these helpers are complementary to the probe_user_read()
and strncpy_from_unsafe_user() which probe user-only memory. Both added
helpers here do the same, but for kernel-only addresses.

Both set of helpers are going to be used for BPF tracing. They also
explicitly avoid throwing the splat for non-canonical user addresses from
00c42373d397 ("x86-64: add warning for non-canonical user access address
dereferences").

For compat, the current probe_kernel_read() and strncpy_from_unsafe() are
left as-is.

  [0] Documentation/x86/x86_64/mm.txt

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: x86@kernel.org
Link: https://lore.kernel.org/bpf/eefeefd769aa5a013531f491a71f0936779e916b.1572649915.git.daniel@iogearbox.net
Cc: <stable@vger.kernel.org> # 5.4
Signed-off-by: Tsahi Zidenberg <tsahee@amazon.com>
---
 arch/x86/mm/Makefile    |  2 +-
 arch/x86/mm/maccess.c   | 43 +++++++++++++++++++++++++++++++++++++++++
 include/linux/uaccess.h |  4 ++++
 mm/maccess.c            | 25 +++++++++++++++++++++++-
 4 files changed, 72 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/mm/maccess.c

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 84373dc9b341..bbc68a54795e 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -13,7 +13,7 @@ CFLAGS_REMOVE_mem_encrypt_identity.o    = -pg
 endif
 
 obj-y    :=  init.o init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
-        pat.o pgtable.o physaddr.o setup_nx.o tlb.o cpu_entry_area.o
+        pat.o pgtable.o physaddr.o setup_nx.o tlb.o cpu_entry_area.o maccess.o
 
 # Make sure __phys_addr has no stackprotector
 nostackp := $(call cc-option, -fno-stack-protector)
diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
new file mode 100644
index 000000000000..f5b85bdc0535
--- /dev/null
+++ b/arch/x86/mm/maccess.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/uaccess.h>
+#include <linux/kernel.h>
+
+#ifdef CONFIG_X86_64
+static __always_inline u64 canonical_address(u64 vaddr, u8 vaddr_bits)
+{
+    return ((s64)vaddr << (64 - vaddr_bits)) >> (64 - vaddr_bits);
+}
+
+static __always_inline bool invalid_probe_range(u64 vaddr)
+{
+    /*
+     * Range covering the highest possible canonical userspace address
+     * as well as non-canonical address range. For the canonical range
+     * we also need to include the userspace guard page.
+     */
+    return vaddr < TASK_SIZE_MAX + PAGE_SIZE ||
+           canonical_address(vaddr, boot_cpu_data.x86_virt_bits) != vaddr;
+}
+#else
+static __always_inline bool invalid_probe_range(u64 vaddr)
+{
+    return vaddr < TASK_SIZE_MAX;
+}
+#endif
+
+long probe_kernel_read_strict(void *dst, const void *src, size_t size)
+{
+    if (unlikely(invalid_probe_range((unsigned long)src)))
+        return -EFAULT;
+
+    return __probe_kernel_read(dst, src, size);
+}
+
+long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr, long count)
+{
+    if (unlikely(invalid_probe_range((unsigned long)unsafe_addr)))
+        return -EFAULT;
+
+    return __strncpy_from_unsafe(dst, unsafe_addr, count);
+}
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 38555435a64a..67f016010aad 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -311,6 +311,7 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
  * happens, handle that and return -EFAULT.
  */
 extern long probe_kernel_read(void *dst, const void *src, size_t size);
+extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
 extern long __probe_kernel_read(void *dst, const void *src, size_t size);
 
 /*
@@ -350,6 +351,9 @@ extern long notrace probe_user_write(void __user *dst, const void *src, size_t s
 extern long notrace __probe_user_write(void __user *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
+extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
+                       long count);
+extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
                      long count);
 extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
diff --git a/mm/maccess.c b/mm/maccess.c
index 2d3c3d01064c..3ca8d97e5010 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -43,11 +43,20 @@ probe_write_common(void __user *dst, const void *src, size_t size)
  * do_page_fault() doesn't attempt to take mmap_sem.  This makes
  * probe_kernel_read() suitable for use within regions where the caller
  * already holds mmap_sem, or other locks which nest inside mmap_sem.
+ *
+ * probe_kernel_read_strict() is the same as probe_kernel_read() except for
+ * the case where architectures have non-overlapping user and kernel address
+ * ranges: probe_kernel_read_strict() will additionally return -EFAULT for
+ * probing memory on a user address range where probe_user_read() is supposed
+ * to be used instead.
  */
 
 long __weak probe_kernel_read(void *dst, const void *src, size_t size)
     __attribute__((alias("__probe_kernel_read")));
 
+long __weak probe_kernel_read_strict(void *dst, const void *src, size_t size)
+    __attribute__((alias("__probe_kernel_read")));
+
 long __probe_kernel_read(void *dst, const void *src, size_t size)
 {
     long ret;
@@ -157,8 +166,22 @@ EXPORT_SYMBOL_GPL(probe_user_write);
  *
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
+ *
+ * strncpy_from_unsafe_strict() is the same as strncpy_from_unsafe() except
+ * for the case where architectures have non-overlapping user and kernel address
+ * ranges: strncpy_from_unsafe_strict() will additionally return -EFAULT for
+ * probing memory on a user address range where strncpy_from_unsafe_user() is
+ * supposed to be used instead.
  */
-long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
+
+long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
+    __attribute__((alias("__strncpy_from_unsafe")));
+
+long __weak strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
+                       long count)
+    __attribute__((alias("__strncpy_from_unsafe")));
+
+long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 {
     mm_segment_t old_fs = get_fs();
     const void *src = unsafe_addr;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* bpf: Add probe_read_{user, kernel} and probe_read_{user,, kernel}_str helpers
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
  2021-04-21 13:07 ` [PATCH 1/8] uaccess: Add strict non-pagefault kernel-space read, function Zidenberg, Tsahi
@ 2021-04-21 13:08 ` Zidenberg, Tsahi
  2021-04-23 15:06   ` Greg KH
  2021-04-21 13:09 ` [PATCH 3/8] bpf: Restrict bpf_probe_read{, str}() only to archs where, they work Zidenberg, Tsahi
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 13:08 UTC (permalink / raw)
  To: stable; +Cc: Greg KH

commit 6ae08ae3dea2cfa03dd3665a3c8475c2d429ef47 upstream

The current bpf_probe_read() and bpf_probe_read_str() helpers are broken
in that they assume they can be used for probing memory access for kernel
space addresses /as well as/ user space addresses.

However, plain use of probe_kernel_read() for both cases will attempt to
always access kernel space address space given access is performed under
KERNEL_DS and some archs in-fact have overlapping address spaces where a
kernel pointer and user pointer would have the /same/ address value and
therefore accessing application memory via bpf_probe_read{,_str}() would
read garbage values.

Lets fix BPF side by making use of recently added 3d7081822f7f ("uaccess:
Add non-pagefault user-space read functions"). Unfortunately, the only way
to fix this status quo is to add dedicated bpf_probe_read_{user,kernel}()
and bpf_probe_read_{user,kernel}_str() helpers. The bpf_probe_read{,_str}()
helpers are kept as-is to retain their current behavior.

The two *_user() variants attempt the access always under USER_DS set, the
two *_kernel() variants will -EFAULT when accessing user memory if the
underlying architecture has non-overlapping address ranges, also avoiding
throwing the kernel warning via 00c42373d397 ("x86-64: add warning for
non-canonical user access address dereferences").

conflict resolution:
upstream also defines skb_output helper. This commit adds skb_output to
FUNC_MAPPER, but does not take any code to add support or comments
detailing the support.

Fixes: a5e8c07059d0 ("bpf: add bpf_probe_read_str helper")
Fixes: 2541517c32be ("tracing, perf: Implement BPF programs attached to kprobes")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/796ee46e948bc808d54891a1108435f8652c6ca4.1572649915.git.daniel@iogearbox.net
Cc: <stable@vger.kernel.org> # 5.4
Signed-off-by: Tsahi Zidenberg <tsahee@amazon.com>
---
 include/uapi/linux/bpf.h       | 123 ++++++++++++++--------
 kernel/trace/bpf_trace.c       | 181 ++++++++++++++++++++++++---------
 tools/include/uapi/linux/bpf.h | 116 +++++++++++++--------
 3 files changed, 294 insertions(+), 126 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8649422e760c..2fe91a083f7a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -560,10 +560,13 @@ union bpf_attr {
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
- * int bpf_probe_read(void *dst, u32 size, const void *src)
+ * int bpf_probe_read(void *dst, u32 size, const void *unsafe_ptr)
  *     Description
  *         For tracing programs, safely attempt to read *size* bytes from
- *         address *src* and store the data in *dst*.
+ *         kernel space address *unsafe_ptr* and store the data in *dst*.
+ *
+ *         Generally, use bpf_probe_read_user() or bpf_probe_read_kernel()
+ *         instead.
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
@@ -1425,45 +1428,14 @@ union bpf_attr {
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
- * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
+ * int bpf_probe_read_str(void *dst, u32 size, const void *unsafe_ptr)
  *     Description
- *         Copy a NUL terminated string from an unsafe address
- *         *unsafe_ptr* to *dst*. The *size* should include the
- *         terminating NUL byte. In case the string length is smaller than
- *         *size*, the target is not padded with further NUL bytes. If the
- *         string length is larger than *size*, just *size*-1 bytes are
- *         copied and the last byte is set to NUL.
- *
- *         On success, the length of the copied string is returned. This
- *         makes this helper useful in tracing programs for reading
- *         strings, and more importantly to get its length at runtime. See
- *         the following snippet:
- *
- *         ::
- *
- *             SEC("kprobe/sys_open")
- *             void bpf_sys_open(struct pt_regs *ctx)
- *             {
- *                     char buf[PATHLEN]; // PATHLEN is defined to 256
- *                     int res = bpf_probe_read_str(buf, sizeof(buf),
- *                                              ctx->di);
- *
- *                 // Consume buf, for example push it to
- *                 // userspace via bpf_perf_event_output(); we
- *                 // can use res (the string length) as event
- *                 // size, after checking its boundaries.
- *             }
- *
- *         In comparison, using **bpf_probe_read()** helper here instead
- *         to read the string would require to estimate the length at
- *         compile time, and would often result in copying more memory
- *         than necessary.
+ *         Copy a NUL terminated string from an unsafe kernel address
+ *         *unsafe_ptr* to *dst*. See bpf_probe_read_kernel_str() for
+ *         more details.
  *
- *         Another useful use case is when parsing individual process
- *         arguments or individual environment variables navigating
- *         *current*\ **->mm->arg_start** and *current*\
- *         **->mm->env_start**: using this helper and the return value,
- *         one can quickly iterate at the right offset of the memory area.
+ *         Generally, use bpf_probe_read_user_str() or bpf_probe_read_kernel_str()
+ *         instead.
  *     Return
  *         On success, the strictly positive length of the string,
  *         including the trailing NUL character. On error, a negative
@@ -2750,6 +2722,72 @@ union bpf_attr {
  *        **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *        **-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_probe_read_user(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Safely attempt to read *size* bytes from user space address
+ *         *unsafe_ptr* and store the data in *dst*.
+ *     Return
+ *         0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Safely attempt to read *size* bytes from kernel space address
+ *         *unsafe_ptr* and store the data in *dst*.
+ *     Return
+ *         0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read_user_str(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Copy a NUL terminated string from an unsafe user address
+ *         *unsafe_ptr* to *dst*. The *size* should include the
+ *         terminating NUL byte. In case the string length is smaller than
+ *         *size*, the target is not padded with further NUL bytes. If the
+ *         string length is larger than *size*, just *size*-1 bytes are
+ *         copied and the last byte is set to NUL.
+ *
+ *         On success, the length of the copied string is returned. This
+ *         makes this helper useful in tracing programs for reading
+ *         strings, and more importantly to get its length at runtime. See
+ *         the following snippet:
+ *
+ *         ::
+ *
+ *             SEC("kprobe/sys_open")
+ *             void bpf_sys_open(struct pt_regs *ctx)
+ *             {
+ *                     char buf[PATHLEN]; // PATHLEN is defined to 256
+ *                     int res = bpf_probe_read_user_str(buf, sizeof(buf),
+ *                                                   ctx->di);
+ *
+ *                 // Consume buf, for example push it to
+ *                 // userspace via bpf_perf_event_output(); we
+ *                 // can use res (the string length) as event
+ *                 // size, after checking its boundaries.
+ *             }
+ *
+ *         In comparison, using **bpf_probe_read_user()** helper here
+ *         instead to read the string would require to estimate the length
+ *         at compile time, and would often result in copying more memory
+ *         than necessary.
+ *
+ *         Another useful use case is when parsing individual process
+ *         arguments or individual environment variables navigating
+ *         *current*\ **->mm->arg_start** and *current*\
+ *         **->mm->env_start**: using this helper and the return value,
+ *         one can quickly iterate at the right offset of the memory area.
+ *     Return
+ *         On success, the strictly positive length of the string,
+ *         including the trailing NUL character. On error, a negative
+ *         value.
+ *
+ * int bpf_probe_read_kernel_str(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Copy a NUL terminated string from an unsafe kernel address *unsafe_ptr*
+ *         to *dst*. Same semantics as with bpf_probe_read_user_str() apply.
+ *     Return
+ *         On success, the strictly positive length of the string,    including
+ *         the trailing NUL character. On error, a negative value.
  */
 #define __BPF_FUNC_MAPPER(FN)        \
     FN(unspec),            \
@@ -2862,7 +2900,12 @@ union bpf_attr {
     FN(sk_storage_get),        \
     FN(sk_storage_delete),        \
     FN(send_signal),        \
-    FN(tcp_gen_syncookie),
+    FN(tcp_gen_syncookie),        \
+    FN(skb_output),            \
+    FN(probe_read_user),        \
+    FN(probe_read_kernel),        \
+    FN(probe_read_user_str),    \
+    FN(probe_read_kernel_str),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 74c1db7178cf..0e329d48ab08 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -138,24 +138,140 @@ static const struct bpf_func_proto bpf_override_return_proto = {
 };
 #endif
 
-BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
+BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size,
+       const void __user *, unsafe_ptr)
 {
-    int ret;
+    int ret = probe_user_read(dst, unsafe_ptr, size);
 
-    ret = security_locked_down(LOCKDOWN_BPF_READ);
-    if (ret < 0)
-        goto out;
+    if (unlikely(ret < 0))
+        memset(dst, 0, size);
+
+    return ret;
+}
+
+static const struct bpf_func_proto bpf_probe_read_user_proto = {
+    .func        = bpf_probe_read_user,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
+
+BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
+       const void __user *, unsafe_ptr)
+{
+    int ret = strncpy_from_unsafe_user(dst, unsafe_ptr, size);
+
+    if (unlikely(ret < 0))
+        memset(dst, 0, size);
+
+    return ret;
+}
+
+static const struct bpf_func_proto bpf_probe_read_user_str_proto = {
+    .func        = bpf_probe_read_user_str,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
 
-    ret = probe_kernel_read(dst, unsafe_ptr, size);
+static __always_inline int
+bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
+                 const bool compat)
+{
+    int ret = security_locked_down(LOCKDOWN_BPF_READ);
+
+    if (unlikely(ret < 0))
+        goto out;
+    ret = compat ? probe_kernel_read(dst, unsafe_ptr, size) :
+          probe_kernel_read_strict(dst, unsafe_ptr, size);
     if (unlikely(ret < 0))
 out:
         memset(dst, 0, size);
+    return ret;
+}
+
+BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
+       const void *, unsafe_ptr)
+{
+    return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, false);
+}
+
+static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
+    .func        = bpf_probe_read_kernel,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
+
+BPF_CALL_3(bpf_probe_read_compat, void *, dst, u32, size,
+       const void *, unsafe_ptr)
+{
+    return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, true);
+}
 
+static const struct bpf_func_proto bpf_probe_read_compat_proto = {
+    .func        = bpf_probe_read_compat,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
+
+static __always_inline int
+bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
+                 const bool compat)
+{
+    int ret = security_locked_down(LOCKDOWN_BPF_READ);
+
+    if (unlikely(ret < 0))
+        goto out;
+    /*
+     * The strncpy_from_unsafe_*() call will likely not fill the entire
+     * buffer, but that's okay in this circumstance as we're probing
+     * arbitrary memory anyway similar to bpf_probe_read_*() and might
+     * as well probe the stack. Thus, memory is explicitly cleared
+     * only in error case, so that improper users ignoring return
+     * code altogether don't copy garbage; otherwise length of string
+     * is returned that can be used for bpf_perf_event_output() et al.
+     */
+    ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
+          strncpy_from_unsafe_strict(dst, unsafe_ptr, size);
+    if (unlikely(ret < 0))
+out:
+        memset(dst, 0, size);
     return ret;
 }
 
-static const struct bpf_func_proto bpf_probe_read_proto = {
-    .func        = bpf_probe_read,
+BPF_CALL_3(bpf_probe_read_kernel_str, void *, dst, u32, size,
+       const void *, unsafe_ptr)
+{
+    return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, false);
+}
+
+static const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
+    .func        = bpf_probe_read_kernel_str,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
+
+BPF_CALL_3(bpf_probe_read_compat_str, void *, dst, u32, size,
+       const void *, unsafe_ptr)
+{
+    return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, true);
+}
+
+static const struct bpf_func_proto bpf_probe_read_compat_str_proto = {
+    .func        = bpf_probe_read_compat_str,
     .gpl_only    = true,
     .ret_type    = RET_INTEGER,
     .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
@@ -583,41 +699,6 @@ static const struct bpf_func_proto bpf_current_task_under_cgroup_proto = {
     .arg2_type      = ARG_ANYTHING,
 };
 
-BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
-       const void *, unsafe_ptr)
-{
-    int ret;
-
-    ret = security_locked_down(LOCKDOWN_BPF_READ);
-    if (ret < 0)
-        goto out;
-
-    /*
-     * The strncpy_from_unsafe() call will likely not fill the entire
-     * buffer, but that's okay in this circumstance as we're probing
-     * arbitrary memory anyway similar to bpf_probe_read() and might
-     * as well probe the stack. Thus, memory is explicitly cleared
-     * only in error case, so that improper users ignoring return
-     * code altogether don't copy garbage; otherwise length of string
-     * is returned that can be used for bpf_perf_event_output() et al.
-     */
-    ret = strncpy_from_unsafe(dst, unsafe_ptr, size);
-    if (unlikely(ret < 0))
-out:
-        memset(dst, 0, size);
-
-    return ret;
-}
-
-static const struct bpf_func_proto bpf_probe_read_str_proto = {
-    .func        = bpf_probe_read_str,
-    .gpl_only    = true,
-    .ret_type    = RET_INTEGER,
-    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
-    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
-    .arg3_type    = ARG_ANYTHING,
-};
-
 struct send_signal_irq_work {
     struct irq_work irq_work;
     struct task_struct *task;
@@ -697,8 +778,6 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
         return &bpf_map_pop_elem_proto;
     case BPF_FUNC_map_peek_elem:
         return &bpf_map_peek_elem_proto;
-    case BPF_FUNC_probe_read:
-        return &bpf_probe_read_proto;
     case BPF_FUNC_ktime_get_ns:
         return &bpf_ktime_get_ns_proto;
     case BPF_FUNC_tail_call:
@@ -725,8 +804,18 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
         return &bpf_current_task_under_cgroup_proto;
     case BPF_FUNC_get_prandom_u32:
         return &bpf_get_prandom_u32_proto;
+    case BPF_FUNC_probe_read_user:
+        return &bpf_probe_read_user_proto;
+    case BPF_FUNC_probe_read_kernel:
+        return &bpf_probe_read_kernel_proto;
+    case BPF_FUNC_probe_read:
+        return &bpf_probe_read_compat_proto;
+    case BPF_FUNC_probe_read_user_str:
+        return &bpf_probe_read_user_str_proto;
+    case BPF_FUNC_probe_read_kernel_str:
+        return &bpf_probe_read_kernel_str_proto;
     case BPF_FUNC_probe_read_str:
-        return &bpf_probe_read_str_proto;
+        return &bpf_probe_read_compat_str_proto;
 #ifdef CONFIG_CGROUPS
     case BPF_FUNC_get_current_cgroup_id:
         return &bpf_get_current_cgroup_id_proto;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 8649422e760c..e4014608849d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -560,10 +560,13 @@ union bpf_attr {
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
- * int bpf_probe_read(void *dst, u32 size, const void *src)
+ * int bpf_probe_read(void *dst, u32 size, const void *unsafe_ptr)
  *     Description
  *         For tracing programs, safely attempt to read *size* bytes from
- *         address *src* and store the data in *dst*.
+ *         kernel space address *unsafe_ptr* and store the data in *dst*.
+ *
+ *         Generally, use bpf_probe_read_user() or bpf_probe_read_kernel()
+ *         instead.
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
@@ -1425,45 +1428,14 @@ union bpf_attr {
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
- * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
+ * int bpf_probe_read_str(void *dst, u32 size, const void *unsafe_ptr)
  *     Description
- *         Copy a NUL terminated string from an unsafe address
- *         *unsafe_ptr* to *dst*. The *size* should include the
- *         terminating NUL byte. In case the string length is smaller than
- *         *size*, the target is not padded with further NUL bytes. If the
- *         string length is larger than *size*, just *size*-1 bytes are
- *         copied and the last byte is set to NUL.
- *
- *         On success, the length of the copied string is returned. This
- *         makes this helper useful in tracing programs for reading
- *         strings, and more importantly to get its length at runtime. See
- *         the following snippet:
- *
- *         ::
- *
- *             SEC("kprobe/sys_open")
- *             void bpf_sys_open(struct pt_regs *ctx)
- *             {
- *                     char buf[PATHLEN]; // PATHLEN is defined to 256
- *                     int res = bpf_probe_read_str(buf, sizeof(buf),
- *                                              ctx->di);
- *
- *                 // Consume buf, for example push it to
- *                 // userspace via bpf_perf_event_output(); we
- *                 // can use res (the string length) as event
- *                 // size, after checking its boundaries.
- *             }
+ *         Copy a NUL terminated string from an unsafe kernel address
+ *         *unsafe_ptr* to *dst*. See bpf_probe_read_kernel_str() for
+ *         more details.
  *
- *         In comparison, using **bpf_probe_read()** helper here instead
- *         to read the string would require to estimate the length at
- *         compile time, and would often result in copying more memory
- *         than necessary.
- *
- *         Another useful use case is when parsing individual process
- *         arguments or individual environment variables navigating
- *         *current*\ **->mm->arg_start** and *current*\
- *         **->mm->env_start**: using this helper and the return value,
- *         one can quickly iterate at the right offset of the memory area.
+ *         Generally, use bpf_probe_read_user_str() or bpf_probe_read_kernel_str()
+ *         instead.
  *     Return
  *         On success, the strictly positive length of the string,
  *         including the trailing NUL character. On error, a negative
@@ -2750,6 +2722,65 @@ union bpf_attr {
  *        **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *        **-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Safely attempt to read *size* bytes from kernel space address
+ *         *unsafe_ptr* and store the data in *dst*.
+ *     Return
+ *         0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read_user_str(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Copy a NUL terminated string from an unsafe user address
+ *         *unsafe_ptr* to *dst*. The *size* should include the
+ *         terminating NUL byte. In case the string length is smaller than
+ *         *size*, the target is not padded with further NUL bytes. If the
+ *         string length is larger than *size*, just *size*-1 bytes are
+ *         copied and the last byte is set to NUL.
+ *
+ *         On success, the length of the copied string is returned. This
+ *         makes this helper useful in tracing programs for reading
+ *         strings, and more importantly to get its length at runtime. See
+ *         the following snippet:
+ *
+ *         ::
+ *
+ *             SEC("kprobe/sys_open")
+ *             void bpf_sys_open(struct pt_regs *ctx)
+ *             {
+ *                     char buf[PATHLEN]; // PATHLEN is defined to 256
+ *                     int res = bpf_probe_read_user_str(buf, sizeof(buf),
+ *                                                   ctx->di);
+ *
+ *                 // Consume buf, for example push it to
+ *                 // userspace via bpf_perf_event_output(); we
+ *                 // can use res (the string length) as event
+ *                 // size, after checking its boundaries.
+ *             }
+ *
+ *         In comparison, using **bpf_probe_read_user()** helper here
+ *         instead to read the string would require to estimate the length
+ *         at compile time, and would often result in copying more memory
+ *         than necessary.
+ *
+ *         Another useful use case is when parsing individual process
+ *         arguments or individual environment variables navigating
+ *         *current*\ **->mm->arg_start** and *current*\
+ *         **->mm->env_start**: using this helper and the return value,
+ *         one can quickly iterate at the right offset of the memory area.
+ *     Return
+ *         On success, the strictly positive length of the string,
+ *         including the trailing NUL character. On error, a negative
+ *         value.
+ *
+ * int bpf_probe_read_kernel_str(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Copy a NUL terminated string from an unsafe kernel address *unsafe_ptr*
+ *         to *dst*. Same semantics as with bpf_probe_read_user_str() apply.
+ *     Return
+ *         On success, the strictly positive length of the string,    including
+ *         the trailing NUL character. On error, a negative value.
  */
 #define __BPF_FUNC_MAPPER(FN)        \
     FN(unspec),            \
@@ -2862,7 +2893,12 @@ union bpf_attr {
     FN(sk_storage_get),        \
     FN(sk_storage_delete),        \
     FN(send_signal),        \
-    FN(tcp_gen_syncookie),
+    FN(tcp_gen_syncookie),        \
+    FN(skb_output),            \
+    FN(probe_read_user),        \
+    FN(probe_read_kernel),        \
+    FN(probe_read_user_str),    \
+    FN(probe_read_kernel_str),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/8] bpf: Restrict bpf_probe_read{, str}() only to archs where, they work
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
  2021-04-21 13:07 ` [PATCH 1/8] uaccess: Add strict non-pagefault kernel-space read, function Zidenberg, Tsahi
  2021-04-21 13:08 ` bpf: Add probe_read_{user, kernel} and probe_read_{user,, kernel}_str helpers Zidenberg, Tsahi
@ 2021-04-21 13:09 ` Zidenberg, Tsahi
  2021-04-21 13:10 ` [PATCH 4/8] powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc, again Zidenberg, Tsahi
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 13:09 UTC (permalink / raw)
  To: stable; +Cc: Greg KH

commit 0ebeea8ca8a4d1d453ad299aef0507dab04f6e8d upstream

Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.

To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").

Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.

However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).

For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
Cc: <stable@vger.kernel.org> # 5.4
Signed-off-by: Tsahi Zidenberg <tsahee@amazon.com>
---
 arch/arm/Kconfig         | 1 +
 arch/arm64/Kconfig       | 1 +
 arch/x86/Kconfig         | 1 +
 init/Kconfig             | 3 +++
 kernel/trace/bpf_trace.c | 6 ++++--
 5 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9aa88715f196..70f4057fb5b0 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -14,6 +14,7 @@ config ARM
     select ARCH_HAS_KEEPINITRD
     select ARCH_HAS_KCOV
     select ARCH_HAS_MEMBARRIER_SYNC_CORE
+    select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
     select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
     select ARCH_HAS_PHYS_TO_DMA
     select ARCH_HAS_SETUP_DMA_OPS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9c8ea5939865..a8c49916ab8c 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -22,6 +22,7 @@ config ARM64
     select ARCH_HAS_KCOV
     select ARCH_HAS_KEEPINITRD
     select ARCH_HAS_MEMBARRIER_SYNC_CORE
+    select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
     select ARCH_HAS_PTE_DEVMAP
     select ARCH_HAS_PTE_SPECIAL
     select ARCH_HAS_SETUP_DMA_OPS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8ef85139553f..b9f666db10c1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -70,6 +70,7 @@ config X86
     select ARCH_HAS_KCOV            if X86_64
     select ARCH_HAS_MEM_ENCRYPT
     select ARCH_HAS_MEMBARRIER_SYNC_CORE
+    select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
     select ARCH_HAS_PMEM_API        if X86_64
     select ARCH_HAS_PTE_DEVMAP        if X86_64
     select ARCH_HAS_PTE_SPECIAL
diff --git a/init/Kconfig b/init/Kconfig
index 96fc45d1b686..1781810d1501 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2210,6 +2210,9 @@ config ASN1
 
 source "kernel/Kconfig.locks"
 
+config ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+    bool
+
 config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
     bool
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0e329d48ab08..80f0072b31e0 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -808,14 +808,16 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
         return &bpf_probe_read_user_proto;
     case BPF_FUNC_probe_read_kernel:
         return &bpf_probe_read_kernel_proto;
-    case BPF_FUNC_probe_read:
-        return &bpf_probe_read_compat_proto;
     case BPF_FUNC_probe_read_user_str:
         return &bpf_probe_read_user_str_proto;
     case BPF_FUNC_probe_read_kernel_str:
         return &bpf_probe_read_kernel_str_proto;
+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+    case BPF_FUNC_probe_read:
+        return &bpf_probe_read_compat_proto;
     case BPF_FUNC_probe_read_str:
         return &bpf_probe_read_compat_str_proto;
+#endif
 #ifdef CONFIG_CGROUPS
     case BPF_FUNC_get_current_cgroup_id:
         return &bpf_get_current_cgroup_id_proto;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/8] powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc, again
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
                   ` (2 preceding siblings ...)
  2021-04-21 13:09 ` [PATCH 3/8] bpf: Restrict bpf_probe_read{, str}() only to archs where, they work Zidenberg, Tsahi
@ 2021-04-21 13:10 ` Zidenberg, Tsahi
  2021-04-21 13:11 ` [PATCH 5/8] bpf: Restrict bpf_trace_printk()'s %s usage and add %pks,, %pus specifier Zidenberg, Tsahi
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 13:10 UTC (permalink / raw)
  To: stable; +Cc: Greg KH


commit d195b1d1d1196681ac4775e0361e9cca70f740c2 upstream

The commit 0ebeea8ca8a4d1d453a ("bpf: Restrict bpf_probe_read{, str}() only
to archs where they work") caused that bpf_probe_read{, str}() functions
were not longer available on architectures where the same logical address
might have different content in kernel and user memory mapping. These
architectures should use probe_read_{user,kernel}_str helpers.

For backward compatibility, the problematic functions are still available
on architectures where the user and kernel address spaces are not
overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

At the moment, these backward compatible functions are enabled only on x86_64,
arm, and arm64. Let's do it also on powerpc that has the non overlapping
address space as well.

Fixes: 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/20200527122844.19524-1-pmladek@suse.com
Cc: <stable@vger.kernel.org> # 5.4
Signed-off-by: Tsahi Zidenberg <tsahee@amazon.com>
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c4cbb65e742f..c50bfab7930b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -129,6 +129,7 @@ config PPC
     select ARCH_HAS_MMIOWB            if PPC64
     select ARCH_HAS_PHYS_TO_DMA
     select ARCH_HAS_PMEM_API
+    select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
     select ARCH_HAS_PTE_DEVMAP        if PPC_BOOK3S_64
     select ARCH_HAS_PTE_SPECIAL
     select ARCH_HAS_MEMBARRIER_CALLBACKS
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/8] bpf: Restrict bpf_trace_printk()'s %s usage and add %pks,, %pus specifier
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
                   ` (3 preceding siblings ...)
  2021-04-21 13:10 ` [PATCH 4/8] powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc, again Zidenberg, Tsahi
@ 2021-04-21 13:11 ` Zidenberg, Tsahi
  2021-04-21 13:12 ` [PATCH 6/8] maccess: rename strncpy_from_unsafe_user to, strncpy_from_user_nofault Zidenberg, Tsahi
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 13:11 UTC (permalink / raw)
  To: stable; +Cc: Greg KH


commit b2a5212fb634561bb734c6356904e37f6665b955 upstream

Usage of plain %s conversion specifier in bpf_trace_printk() suffers from the
very same issue as bpf_probe_read{,str}() helpers, that is, it is broken on
archs with overlapping address ranges.

While the helpers have been addressed through work in 6ae08ae3dea2 ("bpf: Add
probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers"), we need
an option for bpf_trace_printk() as well to fix it.

Similarly as with the helpers, force users to make an explicit choice by adding
%pks and %pus specifier to bpf_trace_printk() which will then pick the corresponding
strncpy_from_unsafe*() variant to perform the access under KERNEL_DS or USER_DS.
The %pk* (kernel specifier) and %pu* (user specifier) can later also be extended
for other objects aside strings that are probed and printed under tracing, and
reused out of other facilities like bpf_seq_printf() or BTF based type printing.

Existing behavior of %s for current users is still kept working for archs where it
is not broken and therefore gated through CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
For archs not having this property we fall-back to pick probing under KERNEL_DS as
a sensible default.

conflict resolution: in vsprintf.c, add the new options [u/k] without
other options added upstream

Fixes: 8d3b7dce8622 ("bpf: add support for %s specifier to bpf_trace_printk()")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Link: https://lore.kernel.org/bpf/20200515101118.6508-4-daniel@iogearbox.net
Cc: <stable@vger.kernel.org> # 5.4
Signed-off-by: Tsahi Zidenberg <tsahee@amazon.com>
---
 Documentation/core-api/printk-formats.rst | 14 ++++
 kernel/trace/bpf_trace.c                  | 94 +++++++++++++++--------
 lib/vsprintf.c                            | 12 +++
 3 files changed, 88 insertions(+), 32 deletions(-)

diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index ecbebf4ca8e7..cf8c2ad8abef 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -110,6 +110,20 @@ used when printing stack backtraces. The specifier takes into
 consideration the effect of compiler optimisations which may occur
 when tail-calls are used and marked with the noreturn GCC attribute.
 
+Probed Pointers from BPF / tracing
+----------------------------------
+
+::
+
+    %pks    kernel string
+    %pus    user string
+
+The ``k`` and ``u`` specifiers are used for printing prior probed memory from
+either kernel memory (k) or user memory (u). The subsequent ``s`` specifier
+results in printing a string. For direct use in regular vsnprintf() the (k)
+and (u) annotation is ignored, however, when used out of BPF's bpf_trace_printk(),
+for example, it reads the memory it is pointing to without faulting.
+
 Kernel Pointers
 ---------------
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 80f0072b31e0..396b91a9b669 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -325,17 +325,15 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
 
 /*
  * Only limited trace_printk() conversion specifiers allowed:
- * %d %i %u %x %ld %li %lu %lx %lld %lli %llu %llx %p %s
+ * %d %i %u %x %ld %li %lu %lx %lld %lli %llu %llx %p %pks %pus %s
  */
 BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
        u64, arg2, u64, arg3)
 {
+    int i, mod[3] = {}, fmt_cnt = 0;
+    char buf[64], fmt_ptype;
+    void *unsafe_ptr = NULL;
     bool str_seen = false;
-    int mod[3] = {};
-    int fmt_cnt = 0;
-    u64 unsafe_addr;
-    char buf[64];
-    int i;
 
     /*
      * bpf_check()->check_func_arg()->check_stack_boundary()
@@ -361,40 +359,71 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
         if (fmt[i] == 'l') {
             mod[fmt_cnt]++;
             i++;
-        } else if (fmt[i] == 'p' || fmt[i] == 's') {
+        } else if (fmt[i] == 'p') {
             mod[fmt_cnt]++;
+            if ((fmt[i + 1] == 'k' ||
+                 fmt[i + 1] == 'u') &&
+                fmt[i + 2] == 's') {
+                fmt_ptype = fmt[i + 1];
+                i += 2;
+                goto fmt_str;
+            }
+
             /* disallow any further format extensions */
             if (fmt[i + 1] != 0 &&
                 !isspace(fmt[i + 1]) &&
                 !ispunct(fmt[i + 1]))
                 return -EINVAL;
-            fmt_cnt++;
-            if (fmt[i] == 's') {
-                if (str_seen)
-                    /* allow only one '%s' per fmt string */
-                    return -EINVAL;
-                str_seen = true;
-
-                switch (fmt_cnt) {
-                case 1:
-                    unsafe_addr = arg1;
-                    arg1 = (long) buf;
-                    break;
-                case 2:
-                    unsafe_addr = arg2;
-                    arg2 = (long) buf;
-                    break;
-                case 3:
-                    unsafe_addr = arg3;
-                    arg3 = (long) buf;
-                    break;
-                }
-                buf[0] = 0;
-                strncpy_from_unsafe(buf,
-                            (void *) (long) unsafe_addr,
+
+            goto fmt_next;
+        } else if (fmt[i] == 's') {
+            mod[fmt_cnt]++;
+            fmt_ptype = fmt[i];
+fmt_str:
+            if (str_seen)
+                /* allow only one '%s' per fmt string */
+                return -EINVAL;
+            str_seen = true;
+
+            if (fmt[i + 1] != 0 &&
+                !isspace(fmt[i + 1]) &&
+                !ispunct(fmt[i + 1]))
+                return -EINVAL;
+
+            switch (fmt_cnt) {
+            case 0:
+                unsafe_ptr = (void *)(long)arg1;
+                arg1 = (long)buf;
+                break;
+            case 1:
+                unsafe_ptr = (void *)(long)arg2;
+                arg2 = (long)buf;
+                break;
+            case 2:
+                unsafe_ptr = (void *)(long)arg3;
+                arg3 = (long)buf;
+                break;
+            }
+
+            buf[0] = 0;
+            switch (fmt_ptype) {
+            case 's':
+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+                strncpy_from_unsafe(buf, unsafe_ptr,
                             sizeof(buf));
+                break;
+#endif
+            case 'k':
+                strncpy_from_unsafe_strict(buf, unsafe_ptr,
+                               sizeof(buf));
+                break;
+            case 'u':
+                strncpy_from_unsafe_user(buf,
+                    (__force void __user *)unsafe_ptr,
+                             sizeof(buf));
+                break;
             }
-            continue;
+            goto fmt_next;
         }
 
         if (fmt[i] == 'l') {
@@ -405,6 +434,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
         if (fmt[i] != 'i' && fmt[i] != 'd' &&
             fmt[i] != 'u' && fmt[i] != 'x')
             return -EINVAL;
+fmt_next:
         fmt_cnt++;
     }
 
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index fb4af73142b4..985ea5c87465 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2116,6 +2116,10 @@ static char *kobject_string(char *buf, char *end, void *ptr,
  *                  c major compatible string
  *                  C full compatible string
  * - 'x' For printing the address. Equivalent to "%lx".
+ * - '[ku]s' For a BPF/tracing related format specifier, e.g. used out of
+ *           bpf_trace_printk() where [ku] prefix specifies either kernel (k)
+ *           or user (u) memory to probe, and:
+ *              s a string, equivalent to "%s" on direct vsnprintf() use
  *
  * ** When making changes please also update:
  *    Documentation/core-api/printk-formats.rst
@@ -2194,6 +2198,14 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
         return kobject_string(buf, end, ptr, spec, fmt);
     case 'x':
         return pointer_string(buf, end, ptr, spec);
+    case 'u':
+    case 'k':
+        switch (fmt[1]) {
+        case 's':
+            return string(buf, end, ptr, spec);
+        default:
+            return error_string(buf, end, "(einval)", spec);
+        }
     }
 
     /* default is to _not_ leak addresses, hash before printing */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/8] maccess: rename strncpy_from_unsafe_user to, strncpy_from_user_nofault
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
                   ` (4 preceding siblings ...)
  2021-04-21 13:11 ` [PATCH 5/8] bpf: Restrict bpf_trace_printk()'s %s usage and add %pks,, %pus specifier Zidenberg, Tsahi
@ 2021-04-21 13:12 ` Zidenberg, Tsahi
  2021-04-21 13:13 ` [PATCH 7/8] maccess: rename strncpy_from_unsafe_strict to, strncpy_from_kernel_nofault Zidenberg, Tsahi
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 13:12 UTC (permalink / raw)
  To: stable; +Cc: Greg KH


commit bd88bb5d4007949be7154deae7cef7173c751a95 upstream

This matches the naming of strncpy_from_user, and also makes it more
clear what the function is supposed to do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-7-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org> # 5.4
Signed-off-by: Tsahi Zidenberg <tsahee@amazon.com>
---
 include/linux/uaccess.h     | 4 ++--
 kernel/trace/bpf_trace.c    | 4 ++--
 kernel/trace/trace_kprobe.c | 2 +-
 mm/maccess.c                | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 67f016010aad..23e655549be2 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -354,8 +354,8 @@ extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
                        long count);
 extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
-extern long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
-                     long count);
+long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
+        long count);
 extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
 
 /**
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 396b91a9b669..720d78c62d05 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -161,7 +161,7 @@ static const struct bpf_func_proto bpf_probe_read_user_proto = {
 BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
        const void __user *, unsafe_ptr)
 {
-    int ret = strncpy_from_unsafe_user(dst, unsafe_ptr, size);
+    int ret = strncpy_from_user_nofault(dst, unsafe_ptr, size);
 
     if (unlikely(ret < 0))
         memset(dst, 0, size);
@@ -418,7 +418,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
                                sizeof(buf));
                 break;
             case 'u':
-                strncpy_from_unsafe_user(buf,
+                strncpy_from_user_nofault(buf,
                     (__force void __user *)unsafe_ptr,
                              sizeof(buf));
                 break;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 233322c77b76..6e26364f1005 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1104,7 +1104,7 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
 
     __dest = get_loc_data(dest, base);
 
-    ret = strncpy_from_unsafe_user(__dest, uaddr, maxlen);
+    ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
     if (ret >= 0)
         *(u32 *)dest = make_data_loc(ret, __dest - base);
 
diff --git a/mm/maccess.c b/mm/maccess.c
index 3ca8d97e5010..84c598673aa9 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -205,7 +205,7 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 }
 
 /**
- * strncpy_from_unsafe_user: - Copy a NUL terminated string from unsafe user
+ * strncpy_from_user_nofault: - Copy a NUL terminated string from unsafe user
  *                address.
  * @dst:   Destination address, in kernel space.  This buffer must be at
  *         least @count bytes long.
@@ -222,7 +222,7 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  */
-long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
+long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
                   long count)
 {
     mm_segment_t old_fs = get_fs();
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/8] maccess: rename strncpy_from_unsafe_strict to, strncpy_from_kernel_nofault
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
                   ` (5 preceding siblings ...)
  2021-04-21 13:12 ` [PATCH 6/8] maccess: rename strncpy_from_unsafe_user to, strncpy_from_user_nofault Zidenberg, Tsahi
@ 2021-04-21 13:13 ` Zidenberg, Tsahi
  2021-04-21 13:14 ` [PATCH 8/8] bpf: rework the compat kernel probe handling Zidenberg, Tsahi
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 13:13 UTC (permalink / raw)
  To: stable; +Cc: Greg KH


commit c4cb164426aebd635baa53685b0ebf1a127e9803 upstream

This matches the naming of strncpy_from_user_nofault, and also makes it
more clear what the function is supposed to do.

conflict resolution: comments in mm/maccess.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-8-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org> # 5.4
Signed-off-by: Tsahi Zidenberg <tsahee@amazon.com>
---
 arch/x86/mm/maccess.c    |  2 +-
 include/linux/uaccess.h  |  4 ++--
 kernel/trace/bpf_trace.c |  4 ++--
 mm/maccess.c             | 31 +++++++++++++++++++++++++------
 4 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index f5b85bdc0535..62c4017a2473 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -34,7 +34,7 @@ long probe_kernel_read_strict(void *dst, const void *src, size_t size)
     return __probe_kernel_read(dst, src, size);
 }
 
-long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr, long count)
+long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
 {
     if (unlikely(invalid_probe_range((unsigned long)unsafe_addr)))
         return -EFAULT;
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 23e655549be2..7c61d3ddae57 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -351,8 +351,8 @@ extern long notrace probe_user_write(void __user *dst, const void *src, size_t s
 extern long notrace __probe_user_write(void __user *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
-extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
-                       long count);
+long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
+        long count);
 extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
         long count);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 720d78c62d05..7b905aa800b2 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -242,7 +242,7 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
      * is returned that can be used for bpf_perf_event_output() et al.
      */
     ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
-          strncpy_from_unsafe_strict(dst, unsafe_ptr, size);
+          strncpy_from_kernel_nofault(dst, unsafe_ptr, size);
     if (unlikely(ret < 0))
 out:
         memset(dst, 0, size);
@@ -414,7 +414,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
                 break;
 #endif
             case 'k':
-                strncpy_from_unsafe_strict(buf, unsafe_ptr,
+                strncpy_from_kernel_nofault(buf, unsafe_ptr,
                                sizeof(buf));
                 break;
             case 'u':
diff --git a/mm/maccess.c b/mm/maccess.c
index 84c598673aa9..82863bc6b550 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -167,17 +167,36 @@ EXPORT_SYMBOL_GPL(probe_user_write);
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  *
- * strncpy_from_unsafe_strict() is the same as strncpy_from_unsafe() except
- * for the case where architectures have non-overlapping user and kernel address
- * ranges: strncpy_from_unsafe_strict() will additionally return -EFAULT for
- * probing memory on a user address range where strncpy_from_unsafe_user() is
- * supposed to be used instead.
+ * Same as strncpy_from_kernel_nofault() except that for architectures with
+ * not fully separated user and kernel address spaces this function also works
+ * for user address tanges.
+ *
+ * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
+ * separate kernel and user address spaces, and also a bad idea otherwise.
  */
 
 long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
     __attribute__((alias("__strncpy_from_unsafe")));
 
-long __weak strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
+/**
+ * strncpy_from_kernel_nofault: - Copy a NUL terminated string from unsafe
+ *                 address.
+ * @dst:   Destination address, in kernel space.  This buffer must be at
+ *         least @count bytes long.
+ * @unsafe_addr: Unsafe address.
+ * @count: Maximum number of bytes to copy, including the trailing NUL.
+ *
+ * Copies a NUL-terminated string from unsafe address to kernel buffer.
+ *
+ * On success, returns the length of the string INCLUDING the trailing NUL.
+ *
+ * If access fails, returns -EFAULT (some data may have been copied
+ * and the trailing NUL added).
+ *
+ * If @count is smaller than the length of the string, copies @count-1 bytes,
+ * sets the last byte of @dst buffer to NUL and returns @count.
+ */
+long __weak strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
                        long count)
     __attribute__((alias("__strncpy_from_unsafe")));
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 8/8] bpf: rework the compat kernel probe handling
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
                   ` (6 preceding siblings ...)
  2021-04-21 13:13 ` [PATCH 7/8] maccess: rename strncpy_from_unsafe_strict to, strncpy_from_kernel_nofault Zidenberg, Tsahi
@ 2021-04-21 13:14 ` Zidenberg, Tsahi
  2021-04-21 13:15 ` [PATCH 2/8] bpf: Add probe_read_{user, kernel} and probe_read_{user,, kernel}_str helpers Zidenberg, Tsahi
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 13:14 UTC (permalink / raw)
  To: stable; +Cc: Greg KH


commit 8d92db5c04d10381f4db70ed99b1b576f5db18a7 upstream

Instead of using the dangerous probe_kernel_read and strncpy_from_unsafe
helpers, rework the compat probes to check if an address is a kernel or
userspace one, and then use the low-level kernel or user probe helper
shared by the proper kernel and user probe helpers.  This slightly
changes behavior as the compat probe on a user address doesn't check
the lockdown flags, just as the pure user probes do.

conflict resolution: in kernel 5.4, these bpf_func_proto structs are
static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-14-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org> # 5.4
Signed-off-by: Tsahi Zidenberg <tsahee@amazon.com>
---
 kernel/trace/bpf_trace.c | 109 ++++++++++++++++++++++++---------------
 1 file changed, 67 insertions(+), 42 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7b905aa800b2..ac07eeb4fa06 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -138,17 +138,23 @@ static const struct bpf_func_proto bpf_override_return_proto = {
 };
 #endif
 
-BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size,
-       const void __user *, unsafe_ptr)
+static __always_inline int
+bpf_probe_read_user_common(void *dst, u32 size, const void __user *unsafe_ptr)
 {
-    int ret = probe_user_read(dst, unsafe_ptr, size);
+    int ret;
 
+    ret = probe_user_read(dst, unsafe_ptr, size);
     if (unlikely(ret < 0))
         memset(dst, 0, size);
-
     return ret;
 }
 
+BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size,
+       const void __user *, unsafe_ptr)
+{
+    return bpf_probe_read_user_common(dst, size, unsafe_ptr);
+}
+
 static const struct bpf_func_proto bpf_probe_read_user_proto = {
     .func        = bpf_probe_read_user,
     .gpl_only    = true,
@@ -158,17 +164,24 @@ static const struct bpf_func_proto bpf_probe_read_user_proto = {
     .arg3_type    = ARG_ANYTHING,
 };
 
-BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
-       const void __user *, unsafe_ptr)
+static __always_inline int
+bpf_probe_read_user_str_common(void *dst, u32 size,
+                   const void __user *unsafe_ptr)
 {
-    int ret = strncpy_from_user_nofault(dst, unsafe_ptr, size);
+    int ret;
 
+    ret = strncpy_from_user_nofault(dst, unsafe_ptr, size);
     if (unlikely(ret < 0))
         memset(dst, 0, size);
-
     return ret;
 }
 
+BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
+       const void __user *, unsafe_ptr)
+{
+    return bpf_probe_read_user_str_common(dst, size, unsafe_ptr);
+}
+
 static const struct bpf_func_proto bpf_probe_read_user_str_proto = {
     .func        = bpf_probe_read_user_str,
     .gpl_only    = true,
@@ -179,25 +192,25 @@ static const struct bpf_func_proto bpf_probe_read_user_str_proto = {
 };
 
 static __always_inline int
-bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
-                 const bool compat)
+bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr)
 {
     int ret = security_locked_down(LOCKDOWN_BPF_READ);
 
     if (unlikely(ret < 0))
-        goto out;
-    ret = compat ? probe_kernel_read(dst, unsafe_ptr, size) :
-          probe_kernel_read_strict(dst, unsafe_ptr, size);
+        goto fail;
+    ret = probe_kernel_read_strict(dst, unsafe_ptr, size);
     if (unlikely(ret < 0))
-out:
-        memset(dst, 0, size);
+        goto fail;
+    return ret;
+fail:
+    memset(dst, 0, size);
     return ret;
 }
 
 BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
        const void *, unsafe_ptr)
 {
-    return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, false);
+    return bpf_probe_read_kernel_common(dst, size, unsafe_ptr);
 }
 
 static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
@@ -209,50 +222,37 @@ static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
     .arg3_type    = ARG_ANYTHING,
 };
 
-BPF_CALL_3(bpf_probe_read_compat, void *, dst, u32, size,
-       const void *, unsafe_ptr)
-{
-    return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, true);
-}
-
-static const struct bpf_func_proto bpf_probe_read_compat_proto = {
-    .func        = bpf_probe_read_compat,
-    .gpl_only    = true,
-    .ret_type    = RET_INTEGER,
-    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
-    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
-    .arg3_type    = ARG_ANYTHING,
-};
-
 static __always_inline int
-bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
-                 const bool compat)
+bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr)
 {
     int ret = security_locked_down(LOCKDOWN_BPF_READ);
 
     if (unlikely(ret < 0))
-        goto out;
+        goto fail;
+
     /*
-     * The strncpy_from_unsafe_*() call will likely not fill the entire
-     * buffer, but that's okay in this circumstance as we're probing
+     * The strncpy_from_kernel_nofault() call will likely not fill the
+     * entire buffer, but that's okay in this circumstance as we're probing
      * arbitrary memory anyway similar to bpf_probe_read_*() and might
      * as well probe the stack. Thus, memory is explicitly cleared
      * only in error case, so that improper users ignoring return
      * code altogether don't copy garbage; otherwise length of string
      * is returned that can be used for bpf_perf_event_output() et al.
      */
-    ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
-          strncpy_from_kernel_nofault(dst, unsafe_ptr, size);
+    ret = strncpy_from_kernel_nofault(dst, unsafe_ptr, size);
     if (unlikely(ret < 0))
-out:
-        memset(dst, 0, size);
+        goto fail;
+
+    return 0;
+fail:
+    memset(dst, 0, size);
     return ret;
 }
 
 BPF_CALL_3(bpf_probe_read_kernel_str, void *, dst, u32, size,
        const void *, unsafe_ptr)
 {
-    return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, false);
+    return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr);
 }
 
 static const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
@@ -264,10 +264,34 @@ static const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
     .arg3_type    = ARG_ANYTHING,
 };
 
+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+BPF_CALL_3(bpf_probe_read_compat, void *, dst, u32, size,
+       const void *, unsafe_ptr)
+{
+    if ((unsigned long)unsafe_ptr < TASK_SIZE) {
+        return bpf_probe_read_user_common(dst, size,
+                (__force void __user *)unsafe_ptr);
+    }
+    return bpf_probe_read_kernel_common(dst, size, unsafe_ptr);
+}
+
+static const struct bpf_func_proto bpf_probe_read_compat_proto = {
+    .func        = bpf_probe_read_compat,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
+
 BPF_CALL_3(bpf_probe_read_compat_str, void *, dst, u32, size,
        const void *, unsafe_ptr)
 {
-    return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, true);
+    if ((unsigned long)unsafe_ptr < TASK_SIZE) {
+        return bpf_probe_read_user_str_common(dst, size,
+                (__force void __user *)unsafe_ptr);
+    }
+    return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr);
 }
 
 static const struct bpf_func_proto bpf_probe_read_compat_str_proto = {
@@ -278,6 +302,7 @@ static const struct bpf_func_proto bpf_probe_read_compat_str_proto = {
     .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
     .arg3_type    = ARG_ANYTHING,
 };
+#endif /* CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE */
 
 BPF_CALL_3(bpf_probe_write_user, void __user *, unsafe_ptr, const void *, src,
        u32, size)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/8] bpf: Add probe_read_{user, kernel} and probe_read_{user,, kernel}_str helpers
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
                   ` (7 preceding siblings ...)
  2021-04-21 13:14 ` [PATCH 8/8] bpf: rework the compat kernel probe handling Zidenberg, Tsahi
@ 2021-04-21 13:15 ` Zidenberg, Tsahi
  2021-04-21 13:18 ` [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Greg KH
  2021-04-23 15:08 ` [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Greg KH
  10 siblings, 0 replies; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 13:15 UTC (permalink / raw)
  To: stable; +Cc: Greg KH


commit 6ae08ae3dea2cfa03dd3665a3c8475c2d429ef47 upstream

The current bpf_probe_read() and bpf_probe_read_str() helpers are broken
in that they assume they can be used for probing memory access for kernel
space addresses /as well as/ user space addresses.

However, plain use of probe_kernel_read() for both cases will attempt to
always access kernel space address space given access is performed under
KERNEL_DS and some archs in-fact have overlapping address spaces where a
kernel pointer and user pointer would have the /same/ address value and
therefore accessing application memory via bpf_probe_read{,_str}() would
read garbage values.

Lets fix BPF side by making use of recently added 3d7081822f7f ("uaccess:
Add non-pagefault user-space read functions"). Unfortunately, the only way
to fix this status quo is to add dedicated bpf_probe_read_{user,kernel}()
and bpf_probe_read_{user,kernel}_str() helpers. The bpf_probe_read{,_str}()
helpers are kept as-is to retain their current behavior.

The two *_user() variants attempt the access always under USER_DS set, the
two *_kernel() variants will -EFAULT when accessing user memory if the
underlying architecture has non-overlapping address ranges, also avoiding
throwing the kernel warning via 00c42373d397 ("x86-64: add warning for
non-canonical user access address dereferences").

conflict resolution:
upstream also defines skb_output helper. This commit adds skb_output to
FUNC_MAPPER, but does not take any code to add support or comments
detailing the support.

Fixes: a5e8c07059d0 ("bpf: add bpf_probe_read_str helper")
Fixes: 2541517c32be ("tracing, perf: Implement BPF programs attached to kprobes")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/796ee46e948bc808d54891a1108435f8652c6ca4.1572649915.git.daniel@iogearbox.net
Cc: <stable@vger.kernel.org> # 5.4
Signed-off-by: Tsahi Zidenberg <tsahee@amazon.com>
---
 include/uapi/linux/bpf.h       | 123 ++++++++++++++--------
 kernel/trace/bpf_trace.c       | 181 ++++++++++++++++++++++++---------
 tools/include/uapi/linux/bpf.h | 116 +++++++++++++--------
 3 files changed, 294 insertions(+), 126 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8649422e760c..2fe91a083f7a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -560,10 +560,13 @@ union bpf_attr {
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
- * int bpf_probe_read(void *dst, u32 size, const void *src)
+ * int bpf_probe_read(void *dst, u32 size, const void *unsafe_ptr)
  *     Description
  *         For tracing programs, safely attempt to read *size* bytes from
- *         address *src* and store the data in *dst*.
+ *         kernel space address *unsafe_ptr* and store the data in *dst*.
+ *
+ *         Generally, use bpf_probe_read_user() or bpf_probe_read_kernel()
+ *         instead.
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
@@ -1425,45 +1428,14 @@ union bpf_attr {
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
- * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
+ * int bpf_probe_read_str(void *dst, u32 size, const void *unsafe_ptr)
  *     Description
- *         Copy a NUL terminated string from an unsafe address
- *         *unsafe_ptr* to *dst*. The *size* should include the
- *         terminating NUL byte. In case the string length is smaller than
- *         *size*, the target is not padded with further NUL bytes. If the
- *         string length is larger than *size*, just *size*-1 bytes are
- *         copied and the last byte is set to NUL.
- *
- *         On success, the length of the copied string is returned. This
- *         makes this helper useful in tracing programs for reading
- *         strings, and more importantly to get its length at runtime. See
- *         the following snippet:
- *
- *         ::
- *
- *             SEC("kprobe/sys_open")
- *             void bpf_sys_open(struct pt_regs *ctx)
- *             {
- *                     char buf[PATHLEN]; // PATHLEN is defined to 256
- *                     int res = bpf_probe_read_str(buf, sizeof(buf),
- *                                              ctx->di);
- *
- *                 // Consume buf, for example push it to
- *                 // userspace via bpf_perf_event_output(); we
- *                 // can use res (the string length) as event
- *                 // size, after checking its boundaries.
- *             }
- *
- *         In comparison, using **bpf_probe_read()** helper here instead
- *         to read the string would require to estimate the length at
- *         compile time, and would often result in copying more memory
- *         than necessary.
+ *         Copy a NUL terminated string from an unsafe kernel address
+ *         *unsafe_ptr* to *dst*. See bpf_probe_read_kernel_str() for
+ *         more details.
  *
- *         Another useful use case is when parsing individual process
- *         arguments or individual environment variables navigating
- *         *current*\ **->mm->arg_start** and *current*\
- *         **->mm->env_start**: using this helper and the return value,
- *         one can quickly iterate at the right offset of the memory area.
+ *         Generally, use bpf_probe_read_user_str() or bpf_probe_read_kernel_str()
+ *         instead.
  *     Return
  *         On success, the strictly positive length of the string,
  *         including the trailing NUL character. On error, a negative
@@ -2750,6 +2722,72 @@ union bpf_attr {
  *        **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *        **-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_probe_read_user(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Safely attempt to read *size* bytes from user space address
+ *         *unsafe_ptr* and store the data in *dst*.
+ *     Return
+ *         0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Safely attempt to read *size* bytes from kernel space address
+ *         *unsafe_ptr* and store the data in *dst*.
+ *     Return
+ *         0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read_user_str(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Copy a NUL terminated string from an unsafe user address
+ *         *unsafe_ptr* to *dst*. The *size* should include the
+ *         terminating NUL byte. In case the string length is smaller than
+ *         *size*, the target is not padded with further NUL bytes. If the
+ *         string length is larger than *size*, just *size*-1 bytes are
+ *         copied and the last byte is set to NUL.
+ *
+ *         On success, the length of the copied string is returned. This
+ *         makes this helper useful in tracing programs for reading
+ *         strings, and more importantly to get its length at runtime. See
+ *         the following snippet:
+ *
+ *         ::
+ *
+ *             SEC("kprobe/sys_open")
+ *             void bpf_sys_open(struct pt_regs *ctx)
+ *             {
+ *                     char buf[PATHLEN]; // PATHLEN is defined to 256
+ *                     int res = bpf_probe_read_user_str(buf, sizeof(buf),
+ *                                                   ctx->di);
+ *
+ *                 // Consume buf, for example push it to
+ *                 // userspace via bpf_perf_event_output(); we
+ *                 // can use res (the string length) as event
+ *                 // size, after checking its boundaries.
+ *             }
+ *
+ *         In comparison, using **bpf_probe_read_user()** helper here
+ *         instead to read the string would require to estimate the length
+ *         at compile time, and would often result in copying more memory
+ *         than necessary.
+ *
+ *         Another useful use case is when parsing individual process
+ *         arguments or individual environment variables navigating
+ *         *current*\ **->mm->arg_start** and *current*\
+ *         **->mm->env_start**: using this helper and the return value,
+ *         one can quickly iterate at the right offset of the memory area.
+ *     Return
+ *         On success, the strictly positive length of the string,
+ *         including the trailing NUL character. On error, a negative
+ *         value.
+ *
+ * int bpf_probe_read_kernel_str(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Copy a NUL terminated string from an unsafe kernel address *unsafe_ptr*
+ *         to *dst*. Same semantics as with bpf_probe_read_user_str() apply.
+ *     Return
+ *         On success, the strictly positive length of the string,    including
+ *         the trailing NUL character. On error, a negative value.
  */
 #define __BPF_FUNC_MAPPER(FN)        \
     FN(unspec),            \
@@ -2862,7 +2900,12 @@ union bpf_attr {
     FN(sk_storage_get),        \
     FN(sk_storage_delete),        \
     FN(send_signal),        \
-    FN(tcp_gen_syncookie),
+    FN(tcp_gen_syncookie),        \
+    FN(skb_output),            \
+    FN(probe_read_user),        \
+    FN(probe_read_kernel),        \
+    FN(probe_read_user_str),    \
+    FN(probe_read_kernel_str),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 74c1db7178cf..0e329d48ab08 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -138,24 +138,140 @@ static const struct bpf_func_proto bpf_override_return_proto = {
 };
 #endif
 
-BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
+BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size,
+       const void __user *, unsafe_ptr)
 {
-    int ret;
+    int ret = probe_user_read(dst, unsafe_ptr, size);
 
-    ret = security_locked_down(LOCKDOWN_BPF_READ);
-    if (ret < 0)
-        goto out;
+    if (unlikely(ret < 0))
+        memset(dst, 0, size);
+
+    return ret;
+}
+
+static const struct bpf_func_proto bpf_probe_read_user_proto = {
+    .func        = bpf_probe_read_user,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
+
+BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
+       const void __user *, unsafe_ptr)
+{
+    int ret = strncpy_from_unsafe_user(dst, unsafe_ptr, size);
+
+    if (unlikely(ret < 0))
+        memset(dst, 0, size);
+
+    return ret;
+}
+
+static const struct bpf_func_proto bpf_probe_read_user_str_proto = {
+    .func        = bpf_probe_read_user_str,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
 
-    ret = probe_kernel_read(dst, unsafe_ptr, size);
+static __always_inline int
+bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
+                 const bool compat)
+{
+    int ret = security_locked_down(LOCKDOWN_BPF_READ);
+
+    if (unlikely(ret < 0))
+        goto out;
+    ret = compat ? probe_kernel_read(dst, unsafe_ptr, size) :
+          probe_kernel_read_strict(dst, unsafe_ptr, size);
     if (unlikely(ret < 0))
 out:
         memset(dst, 0, size);
+    return ret;
+}
+
+BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
+       const void *, unsafe_ptr)
+{
+    return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, false);
+}
+
+static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
+    .func        = bpf_probe_read_kernel,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
+
+BPF_CALL_3(bpf_probe_read_compat, void *, dst, u32, size,
+       const void *, unsafe_ptr)
+{
+    return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, true);
+}
 
+static const struct bpf_func_proto bpf_probe_read_compat_proto = {
+    .func        = bpf_probe_read_compat,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
+
+static __always_inline int
+bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
+                 const bool compat)
+{
+    int ret = security_locked_down(LOCKDOWN_BPF_READ);
+
+    if (unlikely(ret < 0))
+        goto out;
+    /*
+     * The strncpy_from_unsafe_*() call will likely not fill the entire
+     * buffer, but that's okay in this circumstance as we're probing
+     * arbitrary memory anyway similar to bpf_probe_read_*() and might
+     * as well probe the stack. Thus, memory is explicitly cleared
+     * only in error case, so that improper users ignoring return
+     * code altogether don't copy garbage; otherwise length of string
+     * is returned that can be used for bpf_perf_event_output() et al.
+     */
+    ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
+          strncpy_from_unsafe_strict(dst, unsafe_ptr, size);
+    if (unlikely(ret < 0))
+out:
+        memset(dst, 0, size);
     return ret;
 }
 
-static const struct bpf_func_proto bpf_probe_read_proto = {
-    .func        = bpf_probe_read,
+BPF_CALL_3(bpf_probe_read_kernel_str, void *, dst, u32, size,
+       const void *, unsafe_ptr)
+{
+    return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, false);
+}
+
+static const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
+    .func        = bpf_probe_read_kernel_str,
+    .gpl_only    = true,
+    .ret_type    = RET_INTEGER,
+    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
+    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
+    .arg3_type    = ARG_ANYTHING,
+};
+
+BPF_CALL_3(bpf_probe_read_compat_str, void *, dst, u32, size,
+       const void *, unsafe_ptr)
+{
+    return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, true);
+}
+
+static const struct bpf_func_proto bpf_probe_read_compat_str_proto = {
+    .func        = bpf_probe_read_compat_str,
     .gpl_only    = true,
     .ret_type    = RET_INTEGER,
     .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
@@ -583,41 +699,6 @@ static const struct bpf_func_proto bpf_current_task_under_cgroup_proto = {
     .arg2_type      = ARG_ANYTHING,
 };
 
-BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
-       const void *, unsafe_ptr)
-{
-    int ret;
-
-    ret = security_locked_down(LOCKDOWN_BPF_READ);
-    if (ret < 0)
-        goto out;
-
-    /*
-     * The strncpy_from_unsafe() call will likely not fill the entire
-     * buffer, but that's okay in this circumstance as we're probing
-     * arbitrary memory anyway similar to bpf_probe_read() and might
-     * as well probe the stack. Thus, memory is explicitly cleared
-     * only in error case, so that improper users ignoring return
-     * code altogether don't copy garbage; otherwise length of string
-     * is returned that can be used for bpf_perf_event_output() et al.
-     */
-    ret = strncpy_from_unsafe(dst, unsafe_ptr, size);
-    if (unlikely(ret < 0))
-out:
-        memset(dst, 0, size);
-
-    return ret;
-}
-
-static const struct bpf_func_proto bpf_probe_read_str_proto = {
-    .func        = bpf_probe_read_str,
-    .gpl_only    = true,
-    .ret_type    = RET_INTEGER,
-    .arg1_type    = ARG_PTR_TO_UNINIT_MEM,
-    .arg2_type    = ARG_CONST_SIZE_OR_ZERO,
-    .arg3_type    = ARG_ANYTHING,
-};
-
 struct send_signal_irq_work {
     struct irq_work irq_work;
     struct task_struct *task;
@@ -697,8 +778,6 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
         return &bpf_map_pop_elem_proto;
     case BPF_FUNC_map_peek_elem:
         return &bpf_map_peek_elem_proto;
-    case BPF_FUNC_probe_read:
-        return &bpf_probe_read_proto;
     case BPF_FUNC_ktime_get_ns:
         return &bpf_ktime_get_ns_proto;
     case BPF_FUNC_tail_call:
@@ -725,8 +804,18 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
         return &bpf_current_task_under_cgroup_proto;
     case BPF_FUNC_get_prandom_u32:
         return &bpf_get_prandom_u32_proto;
+    case BPF_FUNC_probe_read_user:
+        return &bpf_probe_read_user_proto;
+    case BPF_FUNC_probe_read_kernel:
+        return &bpf_probe_read_kernel_proto;
+    case BPF_FUNC_probe_read:
+        return &bpf_probe_read_compat_proto;
+    case BPF_FUNC_probe_read_user_str:
+        return &bpf_probe_read_user_str_proto;
+    case BPF_FUNC_probe_read_kernel_str:
+        return &bpf_probe_read_kernel_str_proto;
     case BPF_FUNC_probe_read_str:
-        return &bpf_probe_read_str_proto;
+        return &bpf_probe_read_compat_str_proto;
 #ifdef CONFIG_CGROUPS
     case BPF_FUNC_get_current_cgroup_id:
         return &bpf_get_current_cgroup_id_proto;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 8649422e760c..e4014608849d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -560,10 +560,13 @@ union bpf_attr {
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
- * int bpf_probe_read(void *dst, u32 size, const void *src)
+ * int bpf_probe_read(void *dst, u32 size, const void *unsafe_ptr)
  *     Description
  *         For tracing programs, safely attempt to read *size* bytes from
- *         address *src* and store the data in *dst*.
+ *         kernel space address *unsafe_ptr* and store the data in *dst*.
+ *
+ *         Generally, use bpf_probe_read_user() or bpf_probe_read_kernel()
+ *         instead.
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
@@ -1425,45 +1428,14 @@ union bpf_attr {
  *     Return
  *         0 on success, or a negative error in case of failure.
  *
- * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
+ * int bpf_probe_read_str(void *dst, u32 size, const void *unsafe_ptr)
  *     Description
- *         Copy a NUL terminated string from an unsafe address
- *         *unsafe_ptr* to *dst*. The *size* should include the
- *         terminating NUL byte. In case the string length is smaller than
- *         *size*, the target is not padded with further NUL bytes. If the
- *         string length is larger than *size*, just *size*-1 bytes are
- *         copied and the last byte is set to NUL.
- *
- *         On success, the length of the copied string is returned. This
- *         makes this helper useful in tracing programs for reading
- *         strings, and more importantly to get its length at runtime. See
- *         the following snippet:
- *
- *         ::
- *
- *             SEC("kprobe/sys_open")
- *             void bpf_sys_open(struct pt_regs *ctx)
- *             {
- *                     char buf[PATHLEN]; // PATHLEN is defined to 256
- *                     int res = bpf_probe_read_str(buf, sizeof(buf),
- *                                              ctx->di);
- *
- *                 // Consume buf, for example push it to
- *                 // userspace via bpf_perf_event_output(); we
- *                 // can use res (the string length) as event
- *                 // size, after checking its boundaries.
- *             }
+ *         Copy a NUL terminated string from an unsafe kernel address
+ *         *unsafe_ptr* to *dst*. See bpf_probe_read_kernel_str() for
+ *         more details.
  *
- *         In comparison, using **bpf_probe_read()** helper here instead
- *         to read the string would require to estimate the length at
- *         compile time, and would often result in copying more memory
- *         than necessary.
- *
- *         Another useful use case is when parsing individual process
- *         arguments or individual environment variables navigating
- *         *current*\ **->mm->arg_start** and *current*\
- *         **->mm->env_start**: using this helper and the return value,
- *         one can quickly iterate at the right offset of the memory area.
+ *         Generally, use bpf_probe_read_user_str() or bpf_probe_read_kernel_str()
+ *         instead.
  *     Return
  *         On success, the strictly positive length of the string,
  *         including the trailing NUL character. On error, a negative
@@ -2750,6 +2722,65 @@ union bpf_attr {
  *        **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *        **-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Safely attempt to read *size* bytes from kernel space address
+ *         *unsafe_ptr* and store the data in *dst*.
+ *     Return
+ *         0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read_user_str(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Copy a NUL terminated string from an unsafe user address
+ *         *unsafe_ptr* to *dst*. The *size* should include the
+ *         terminating NUL byte. In case the string length is smaller than
+ *         *size*, the target is not padded with further NUL bytes. If the
+ *         string length is larger than *size*, just *size*-1 bytes are
+ *         copied and the last byte is set to NUL.
+ *
+ *         On success, the length of the copied string is returned. This
+ *         makes this helper useful in tracing programs for reading
+ *         strings, and more importantly to get its length at runtime. See
+ *         the following snippet:
+ *
+ *         ::
+ *
+ *             SEC("kprobe/sys_open")
+ *             void bpf_sys_open(struct pt_regs *ctx)
+ *             {
+ *                     char buf[PATHLEN]; // PATHLEN is defined to 256
+ *                     int res = bpf_probe_read_user_str(buf, sizeof(buf),
+ *                                                   ctx->di);
+ *
+ *                 // Consume buf, for example push it to
+ *                 // userspace via bpf_perf_event_output(); we
+ *                 // can use res (the string length) as event
+ *                 // size, after checking its boundaries.
+ *             }
+ *
+ *         In comparison, using **bpf_probe_read_user()** helper here
+ *         instead to read the string would require to estimate the length
+ *         at compile time, and would often result in copying more memory
+ *         than necessary.
+ *
+ *         Another useful use case is when parsing individual process
+ *         arguments or individual environment variables navigating
+ *         *current*\ **->mm->arg_start** and *current*\
+ *         **->mm->env_start**: using this helper and the return value,
+ *         one can quickly iterate at the right offset of the memory area.
+ *     Return
+ *         On success, the strictly positive length of the string,
+ *         including the trailing NUL character. On error, a negative
+ *         value.
+ *
+ * int bpf_probe_read_kernel_str(void *dst, u32 size, const void *unsafe_ptr)
+ *     Description
+ *         Copy a NUL terminated string from an unsafe kernel address *unsafe_ptr*
+ *         to *dst*. Same semantics as with bpf_probe_read_user_str() apply.
+ *     Return
+ *         On success, the strictly positive length of the string,    including
+ *         the trailing NUL character. On error, a negative value.
  */
 #define __BPF_FUNC_MAPPER(FN)        \
     FN(unspec),            \
@@ -2862,7 +2893,12 @@ union bpf_attr {
     FN(sk_storage_get),        \
     FN(sk_storage_delete),        \
     FN(send_signal),        \
-    FN(tcp_gen_syncookie),
+    FN(tcp_gen_syncookie),        \
+    FN(skb_output),            \
+    FN(probe_read_user),        \
+    FN(probe_read_kernel),        \
+    FN(probe_read_user_str),    \
+    FN(probe_read_kernel_str),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}()
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
                   ` (8 preceding siblings ...)
  2021-04-21 13:15 ` [PATCH 2/8] bpf: Add probe_read_{user, kernel} and probe_read_{user,, kernel}_str helpers Zidenberg, Tsahi
@ 2021-04-21 13:18 ` Greg KH
  2021-04-21 14:27   ` [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{, str}() Zidenberg, Tsahi
  2021-04-23 15:08 ` [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Greg KH
  10 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2021-04-21 13:18 UTC (permalink / raw)
  To: Zidenberg, Tsahi; +Cc: stable

On Wed, Apr 21, 2021 at 04:05:32PM +0300, Zidenberg, Tsahi wrote:
> In arm64, kernelspace address accessors cannot be used to access
> userspace addresses, which means bpf_probe_read{,str}() cannot access
> userspace addresses. That causes e.g. command-line parameters to not
> appear when snooping execve using bpf.
> 
> This patch series takes the upstream solution. This solution also
> changes user API in the following ways:
> * Add probe_read_{user, kernel}{,_str} bpf helpers
> * Add skb_output helper to the enum only (calling it not supported)
> * Add support for %pks, %pus specifiers
> 
> An alternative fix only takes the required logic to existing API without
> adding new API, was suggested here:
> https://www.spinics.net/lists/stable/msg454945.html
> 
> Another option is to only take patches [1-4] of this patchset, and add
> on top of them commit 8d92db5c04d1 ("bpf: rework the compat kernel probe
> handling"). In that case, the last patch would require function renames
> and conflict resolutions that were avoided in this patchset by pulling
> patches [5-7].

What stable tree(s) are you expecting these to be relevant for?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{, str}()
  2021-04-21 13:18 ` [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Greg KH
@ 2021-04-21 14:27   ` Zidenberg, Tsahi
  0 siblings, 0 replies; 15+ messages in thread
From: Zidenberg, Tsahi @ 2021-04-21 14:27 UTC (permalink / raw)
  To: Greg KH; +Cc: stable



On 21/04/2021 16:18, Greg KH wrote:
> On Wed, Apr 21, 2021 at 04:05:32PM +0300, Zidenberg, Tsahi wrote:
>> In arm64, kernelspace address accessors cannot be used to access
>> userspace addresses, which means bpf_probe_read{,str}() cannot access
>> userspace addresses. That causes e.g. command-line parameters to not
>> appear when snooping execve using bpf.
>>
>> This patch series takes the upstream solution. This solution also
>> changes user API in the following ways:
>> * Add probe_read_{user, kernel}{,_str} bpf helpers
>> * Add skb_output helper to the enum only (calling it not supported)
>> * Add support for %pks, %pus specifiers
>>
>> An alternative fix only takes the required logic to existing API without
>> adding new API, was suggested here:
>> https://www.spinics.net/lists/stable/msg454945.html
>>
>> Another option is to only take patches [1-4] of this patchset, and add
>> on top of them commit 8d92db5c04d1 ("bpf: rework the compat kernel probe
>> handling"). In that case, the last patch would require function renames
>> and conflict resolutions that were avoided in this patchset by pulling
>> patches [5-7].
> What stable tree(s) are you expecting these to be relevant for?
Sorry I forgot to mention it here..
This patchset was created/tested for 5.4.
I expect it to also be relevant for 4.19, but I didn't test it.
If you like it for 5.4 I could test and resubmit or notify you.
No updates are necessary for 5.10.

Thank you!
Tsahi.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bpf: Add probe_read_{user, kernel} and probe_read_{user,, kernel}_str helpers
  2021-04-21 13:08 ` bpf: Add probe_read_{user, kernel} and probe_read_{user,, kernel}_str helpers Zidenberg, Tsahi
@ 2021-04-23 15:06   ` Greg KH
  0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2021-04-23 15:06 UTC (permalink / raw)
  To: Zidenberg, Tsahi; +Cc: stable

On Wed, Apr 21, 2021 at 04:08:19PM +0300, Zidenberg, Tsahi wrote:
> commit 6ae08ae3dea2cfa03dd3665a3c8475c2d429ef47 upstream

<snip>

Why was this commit sent twice?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}()
  2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
                   ` (9 preceding siblings ...)
  2021-04-21 13:18 ` [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Greg KH
@ 2021-04-23 15:08 ` Greg KH
  2021-04-24 14:47   ` Greg KH
  10 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2021-04-23 15:08 UTC (permalink / raw)
  To: Zidenberg, Tsahi; +Cc: stable

On Wed, Apr 21, 2021 at 04:05:32PM +0300, Zidenberg, Tsahi wrote:
> In arm64, kernelspace address accessors cannot be used to access
> userspace addresses, which means bpf_probe_read{,str}() cannot access
> userspace addresses. That causes e.g. command-line parameters to not
> appear when snooping execve using bpf.

Again, this really feels like a new feature, not a regression or bugfix
at all.  And in looking at these patches, that feeling really gets
stronger.

> This patch series takes the upstream solution. This solution also
> changes user API in the following ways:
> * Add probe_read_{user, kernel}{,_str} bpf helpers
> * Add skb_output helper to the enum only (calling it not supported)
> * Add support for %pks, %pus specifiers
> 
> An alternative fix only takes the required logic to existing API without
> adding new API, was suggested here:
> https://www.spinics.net/lists/stable/msg454945.html
> 
> Another option is to only take patches [1-4] of this patchset, and add
> on top of them commit 8d92db5c04d1 ("bpf: rework the compat kernel probe
> handling"). In that case, the last patch would require function renames
> and conflict resolutions that were avoided in this patchset by pulling
> patches [5-7].

The other option is to move your system to a newer kernel release that
has this new feature, right?  :)

What prevents you from doing that today?  What bug is this solving that
worked in previous kernel releases and was broken in 5.4.y?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}()
  2021-04-23 15:08 ` [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Greg KH
@ 2021-04-24 14:47   ` Greg KH
  0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2021-04-24 14:47 UTC (permalink / raw)
  To: Zidenberg, Tsahi; +Cc: stable

On Fri, Apr 23, 2021 at 05:08:03PM +0200, Greg KH wrote:
> On Wed, Apr 21, 2021 at 04:05:32PM +0300, Zidenberg, Tsahi wrote:
> > In arm64, kernelspace address accessors cannot be used to access
> > userspace addresses, which means bpf_probe_read{,str}() cannot access
> > userspace addresses. That causes e.g. command-line parameters to not
> > appear when snooping execve using bpf.
> 
> Again, this really feels like a new feature, not a regression or bugfix
> at all.  And in looking at these patches, that feeling really gets
> stronger.
> 
> > This patch series takes the upstream solution. This solution also
> > changes user API in the following ways:
> > * Add probe_read_{user, kernel}{,_str} bpf helpers
> > * Add skb_output helper to the enum only (calling it not supported)
> > * Add support for %pks, %pus specifiers
> > 
> > An alternative fix only takes the required logic to existing API without
> > adding new API, was suggested here:
> > https://www.spinics.net/lists/stable/msg454945.html
> > 
> > Another option is to only take patches [1-4] of this patchset, and add
> > on top of them commit 8d92db5c04d1 ("bpf: rework the compat kernel probe
> > handling"). In that case, the last patch would require function renames
> > and conflict resolutions that were avoided in this patchset by pulling
> > patches [5-7].
> 
> The other option is to move your system to a newer kernel release that
> has this new feature, right?  :)
> 
> What prevents you from doing that today?  What bug is this solving that
> worked in previous kernel releases and was broken in 5.4.y?

And again, "feature parity across CPU architectures for the same
release" is nothing that Linux has EVER guaranteed...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-04-24 14:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21 13:05 [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Zidenberg, Tsahi
2021-04-21 13:07 ` [PATCH 1/8] uaccess: Add strict non-pagefault kernel-space read, function Zidenberg, Tsahi
2021-04-21 13:08 ` bpf: Add probe_read_{user, kernel} and probe_read_{user,, kernel}_str helpers Zidenberg, Tsahi
2021-04-23 15:06   ` Greg KH
2021-04-21 13:09 ` [PATCH 3/8] bpf: Restrict bpf_probe_read{, str}() only to archs where, they work Zidenberg, Tsahi
2021-04-21 13:10 ` [PATCH 4/8] powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc, again Zidenberg, Tsahi
2021-04-21 13:11 ` [PATCH 5/8] bpf: Restrict bpf_trace_printk()'s %s usage and add %pks,, %pus specifier Zidenberg, Tsahi
2021-04-21 13:12 ` [PATCH 6/8] maccess: rename strncpy_from_unsafe_user to, strncpy_from_user_nofault Zidenberg, Tsahi
2021-04-21 13:13 ` [PATCH 7/8] maccess: rename strncpy_from_unsafe_strict to, strncpy_from_kernel_nofault Zidenberg, Tsahi
2021-04-21 13:14 ` [PATCH 8/8] bpf: rework the compat kernel probe handling Zidenberg, Tsahi
2021-04-21 13:15 ` [PATCH 2/8] bpf: Add probe_read_{user, kernel} and probe_read_{user,, kernel}_str helpers Zidenberg, Tsahi
2021-04-21 13:18 ` [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Greg KH
2021-04-21 14:27   ` [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{, str}() Zidenberg, Tsahi
2021-04-23 15:08 ` [PATCH 0/8] Fix bpf: fix userspace access for bpf_probe_read{,str}() Greg KH
2021-04-24 14:47   ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).