linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] bpf: Add support for reading user pointers
@ 2019-05-02 20:49 Joel Fernandes (Google)
  2019-05-03 12:12 ` Qais Yousef
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Joel Fernandes (Google) @ 2019-05-02 20:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Michal Gregorczyk, Adrian Ratiu, Mohammad Husain, Qais Yousef,
	Srinivas Ramana, duyuchao, Manjo Raja Rao, Karim Yaghmour,
	Tamir Carmeli, Yonghong Song, Alexei Starovoitov, Brendan Gregg,
	Masami Hiramatsu, Peter Ziljstra, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev

The eBPF based opensnoop tool fails to read the file path string passed
to the do_sys_open function. This is because it is a pointer to
userspace address and causes an -EFAULT when read with
probe_kernel_read. This is not an issue when running the tool on x86 but
is an issue on arm64. This patch adds a new bpf function call based
which calls the recently proposed probe_user_read function [1].
Using this function call from opensnoop fixes the issue on arm64.

[1] https://lore.kernel.org/patchwork/patch/1051588/

Cc: Michal Gregorczyk <michalgr@live.com>
Cc: Adrian Ratiu <adrian.ratiu@collabora.com>
Cc: Mohammad Husain <russoue@gmail.com>
Cc: Qais Yousef <qais.yousef@arm.com>
Cc: Srinivas Ramana <sramana@codeaurora.org>
Cc: duyuchao <yuchao.du@unisoc.com>
Cc: Manjo Raja Rao <linux@manojrajarao.com>
Cc: Karim Yaghmour <karim.yaghmour@opersys.com>
Cc: Tamir Carmeli <carmeli.tamir@gmail.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Ziljstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: kernel-team@android.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/uapi/linux/bpf.h       |  7 ++++++-
 kernel/trace/bpf_trace.c       | 22 ++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  7 ++++++-
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e99e3e6f8b37..6fec701eaa46 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -539,6 +539,10 @@ union bpf_attr {
  *     @mode: operation mode (enum bpf_adj_room_mode)
  *     @flags: reserved for future use
  *     Return: 0 on success or negative error code
+ *
+ * int bpf_probe_read_user(void *dst, int size, void *src)
+ *     Read a userspace pointer safely.
+ *     Return: 0 on success or negative error
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -591,7 +595,8 @@ union bpf_attr {
 	FN(get_socket_uid),		\
 	FN(set_hash),			\
 	FN(setsockopt),			\
-	FN(skb_adjust_room),
+	FN(skb_adjust_room),		\
+	FN(probe_read_user),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index dc498b605d5d..1e1a11d9faa8 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -81,6 +81,26 @@ static const struct bpf_func_proto bpf_probe_read_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void *, unsafe_ptr)
+{
+	int ret;
+
+	ret = probe_user_read(dst, unsafe_ptr, size);
+	if (unlikely(ret < 0))
+		memset(dst, 0, size);
+
+	return ret;
+}
+
+static const struct bpf_func_proto bpf_probe_read_user_proto = {
+	.func		= bpf_probe_read_user,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_ANYTHING,
+};
+
 BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src,
 	   u32, size)
 {
@@ -459,6 +479,8 @@ static const struct bpf_func_proto *tracing_func_proto(enum bpf_func_id func_id)
 		return &bpf_map_delete_elem_proto;
 	case BPF_FUNC_probe_read:
 		return &bpf_probe_read_proto;
+	case BPF_FUNC_probe_read_user:
+		return &bpf_probe_read_user_proto;
 	case BPF_FUNC_ktime_get_ns:
 		return &bpf_ktime_get_ns_proto;
 	case BPF_FUNC_tail_call:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e99e3e6f8b37..6fec701eaa46 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -539,6 +539,10 @@ union bpf_attr {
  *     @mode: operation mode (enum bpf_adj_room_mode)
  *     @flags: reserved for future use
  *     Return: 0 on success or negative error code
+ *
+ * int bpf_probe_read_user(void *dst, int size, void *src)
+ *     Read a userspace pointer safely.
+ *     Return: 0 on success or negative error
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -591,7 +595,8 @@ union bpf_attr {
 	FN(get_socket_uid),		\
 	FN(set_hash),			\
 	FN(setsockopt),			\
-	FN(skb_adjust_room),
+	FN(skb_adjust_room),		\
+	FN(probe_read_user),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.21.0.593.g511ec345e18-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-02 20:49 [PATCH RFC] bpf: Add support for reading user pointers Joel Fernandes (Google)
@ 2019-05-03 12:12 ` Qais Yousef
  2019-05-03 13:49   ` Joel Fernandes
  2019-05-05  7:19 ` Alexei Starovoitov
  2019-05-06 14:47 ` Masami Hiramatsu
  2 siblings, 1 reply; 21+ messages in thread
From: Qais Yousef @ 2019-05-03 12:12 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Srinivas Ramana, duyuchao, Manjo Raja Rao, Karim Yaghmour,
	Tamir Carmeli, Yonghong Song, Alexei Starovoitov, Brendan Gregg,
	Masami Hiramatsu, Peter Ziljstra, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev

Hi Joel

On 05/02/19 16:49, Joel Fernandes (Google) wrote:
> The eBPF based opensnoop tool fails to read the file path string passed
> to the do_sys_open function. This is because it is a pointer to
> userspace address and causes an -EFAULT when read with
> probe_kernel_read. This is not an issue when running the tool on x86 but
> is an issue on arm64. This patch adds a new bpf function call based

I just did an experiment and if I use Android 4.9 kernel I indeed fail to see
PATH info when running opensnoop. But if I run on 5.1-rc7 opensnoop behaves
correctly on arm64.

My guess either a limitation that was fixed on later kernel versions or Android
kernel has some strict option/modifications that make this fail?




root@buildroot:/# uname -a
Linux buildroot 5.1.0-rc7-00164-ga00214620959-dirty #41 SMP PREEMPT Thu May 2 16:33:00 BST 2019 aarch64 GNU/Linux
root@buildroot:/# opensnoop
PID    COMM               FD ERR PATH
5180   default.script     -1   2 /etc/ld.so.cache
5180   default.script     -1   2 /lib/tls/v8l/neon/vfp/libresolv.so.2
5180   default.script     -1   2 /lib/tls/v8l/neon/libresolv.so.2
5180   default.script     -1   2 /lib/tls/v8l/vfp/libresolv.so.2
5180   default.script     -1   2 /lib/tls/v8l/libresolv.so.2
5180   default.script     -1   2 /lib/tls/neon/vfp/libresolv.so.2
5180   default.script     -1   2 /lib/tls/neon/libresolv.so.2
5180   default.script     -1   2 /lib/tls/vfp/libresolv.so.2
5180   default.script     -1   2 /lib/tls/libresolv.so.2
5180   default.script     -1   2 /lib/v8l/neon/vfp/libresolv.so.2
5180   default.script     -1   2 /lib/v8l/neon/libresolv.so.2
5180   default.script     -1   2 /lib/v8l/vfp/libresolv.so.2
5180   default.script     -1   2 /lib/v8l/libresolv.so.2
5180   default.script     -1   2 /lib/neon/vfp/libresolv.so.2
5180   default.script     -1   2 /lib/neon/libresolv.so.2
5180   default.script     -1   2 /lib/vfp/libresolv.so.2
5180   default.script      3   0 /lib/libresolv.so.2
5180   default.script      3   0 /lib/libc.so.6
5180   default.script      3   0 /usr/share/udhcpc/default.script
5180   default.script      3   0 /usr/share/udhcpc/default.script.d/




--
Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-03 12:12 ` Qais Yousef
@ 2019-05-03 13:49   ` Joel Fernandes
  2019-05-03 13:54     ` Peter Zijlstra
  2019-05-05 11:04     ` Qais Yousef
  0 siblings, 2 replies; 21+ messages in thread
From: Joel Fernandes @ 2019-05-03 13:49 UTC (permalink / raw)
  To: Qais Yousef
  Cc: linux-kernel, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Srinivas Ramana, duyuchao, Manjo Raja Rao, Karim Yaghmour,
	Tamir Carmeli, Yonghong Song, Alexei Starovoitov, Brendan Gregg,
	Masami Hiramatsu, Peter Ziljstra, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev

On Fri, May 03, 2019 at 01:12:34PM +0100, Qais Yousef wrote:
> Hi Joel
> 
> On 05/02/19 16:49, Joel Fernandes (Google) wrote:
> > The eBPF based opensnoop tool fails to read the file path string passed
> > to the do_sys_open function. This is because it is a pointer to
> > userspace address and causes an -EFAULT when read with
> > probe_kernel_read. This is not an issue when running the tool on x86 but
> > is an issue on arm64. This patch adds a new bpf function call based
> 
> I just did an experiment and if I use Android 4.9 kernel I indeed fail to see
> PATH info when running opensnoop. But if I run on 5.1-rc7 opensnoop behaves
> correctly on arm64.
> 
> My guess either a limitation that was fixed on later kernel versions or Android
> kernel has some strict option/modifications that make this fail?

Thanks a lot for checking, yes I was testing 4.9 kernel with this patch (pixel 3).

I am not sure what has changed since then, but I still think it is a good
idea to make the code more robust against such future issues anyway. In
particular, we learnt with extensive discussions that user/kernel pointers
are not necessarily distinguishable purely based on their address.

I hope agree this is an issue we need to fix.

See these discussions:

https://lkml.kernel.org/r/20190220171019.5e81a4946b56982f324f7c45@kernel.org
https://lore.kernel.org/lkml/20190220171019.5e81a4946b56982f324f7c45@kernel.org/T/#mf81816dbfe25ac5d0e96fbab029050e892f73af2

thanks,

 - Joel

> root@buildroot:/# uname -a
> Linux buildroot 5.1.0-rc7-00164-ga00214620959-dirty #41 SMP PREEMPT Thu May 2 16:33:00 BST 2019 aarch64 GNU/Linux
> root@buildroot:/# opensnoop
> PID    COMM               FD ERR PATH
> 5180   default.script     -1   2 /etc/ld.so.cache
> 5180   default.script     -1   2 /lib/tls/v8l/neon/vfp/libresolv.so.2
> 5180   default.script     -1   2 /lib/tls/v8l/neon/libresolv.so.2
> 5180   default.script     -1   2 /lib/tls/v8l/vfp/libresolv.so.2
> 5180   default.script     -1   2 /lib/tls/v8l/libresolv.so.2
> 5180   default.script     -1   2 /lib/tls/neon/vfp/libresolv.so.2
> 5180   default.script     -1   2 /lib/tls/neon/libresolv.so.2
> 5180   default.script     -1   2 /lib/tls/vfp/libresolv.so.2
> 5180   default.script     -1   2 /lib/tls/libresolv.so.2
> 5180   default.script     -1   2 /lib/v8l/neon/vfp/libresolv.so.2
> 5180   default.script     -1   2 /lib/v8l/neon/libresolv.so.2
> 5180   default.script     -1   2 /lib/v8l/vfp/libresolv.so.2
> 5180   default.script     -1   2 /lib/v8l/libresolv.so.2
> 5180   default.script     -1   2 /lib/neon/vfp/libresolv.so.2
> 5180   default.script     -1   2 /lib/neon/libresolv.so.2
> 5180   default.script     -1   2 /lib/vfp/libresolv.so.2
> 5180   default.script      3   0 /lib/libresolv.so.2
> 5180   default.script      3   0 /lib/libc.so.6
> 5180   default.script      3   0 /usr/share/udhcpc/default.script
> 5180   default.script      3   0 /usr/share/udhcpc/default.script.d/
> 
> 
> 
> 
> --
> Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-03 13:49   ` Joel Fernandes
@ 2019-05-03 13:54     ` Peter Zijlstra
  2019-05-03 15:09       ` Joel Fernandes
  2019-05-05 11:04     ` Qais Yousef
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2019-05-03 13:54 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Qais Yousef, linux-kernel, Michal Gregorczyk, Adrian Ratiu,
	Mohammad Husain, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev

On Fri, May 03, 2019 at 09:49:35AM -0400, Joel Fernandes wrote:
> In
> particular, we learnt with extensive discussions that user/kernel pointers
> are not necessarily distinguishable purely based on their address.

This is correct; a number of architectures have a completely separate
user and kernel address space. Much like how the old i386 4G:4G patches
worked.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-03 13:54     ` Peter Zijlstra
@ 2019-05-03 15:09       ` Joel Fernandes
  0 siblings, 0 replies; 21+ messages in thread
From: Joel Fernandes @ 2019-05-03 15:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Qais Yousef, linux-kernel, Michal Gregorczyk, Adrian Ratiu,
	Mohammad Husain, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev

On Fri, May 03, 2019 at 03:54:26PM +0200, Peter Zijlstra wrote:
> On Fri, May 03, 2019 at 09:49:35AM -0400, Joel Fernandes wrote:
> > In
> > particular, we learnt with extensive discussions that user/kernel pointers
> > are not necessarily distinguishable purely based on their address.
> 
> This is correct; a number of architectures have a completely separate
> user and kernel address space. Much like how the old i386 4G:4G patches
> worked.

Thanks Peter for confirming.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-02 20:49 [PATCH RFC] bpf: Add support for reading user pointers Joel Fernandes (Google)
  2019-05-03 12:12 ` Qais Yousef
@ 2019-05-05  7:19 ` Alexei Starovoitov
  2019-05-05 13:33   ` Joel Fernandes
  2019-05-06 14:47 ` Masami Hiramatsu
  2 siblings, 1 reply; 21+ messages in thread
From: Alexei Starovoitov @ 2019-05-05  7:19 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: LKML, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Qais Yousef, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Peter Ziljstra, Steven Rostedt,
	Kees Cook, Android Kernel Team, Daniel Borkmann, Ingo Molnar,
	Network Development

On Thu, May 2, 2019 at 1:50 PM Joel Fernandes (Google)
<joel@joelfernandes.org> wrote:
>
> The eBPF based opensnoop tool fails to read the file path string passed
> to the do_sys_open function. This is because it is a pointer to
> userspace address and causes an -EFAULT when read with
> probe_kernel_read. This is not an issue when running the tool on x86 but
> is an issue on arm64. This patch adds a new bpf function call based
> which calls the recently proposed probe_user_read function [1].
> Using this function call from opensnoop fixes the issue on arm64.
>
> [1] https://lore.kernel.org/patchwork/patch/1051588/
...
> +BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void *, unsafe_ptr)
> +{
> +       int ret;
> +
> +       ret = probe_user_read(dst, unsafe_ptr, size);
> +       if (unlikely(ret < 0))
> +               memset(dst, 0, size);
> +
> +       return ret;
> +}

probe_user_read() doesn't exist in bpf-next
therefore this patch has to wait for the next merge window.
At the same time we would need to introduce
bpf_probe_read_kernel() and introduce a load time warning
for existing bpf_probe_read(), so we can deprecate it eventually.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-03 13:49   ` Joel Fernandes
  2019-05-03 13:54     ` Peter Zijlstra
@ 2019-05-05 11:04     ` Qais Yousef
  2019-05-05 13:29       ` Joel Fernandes
  1 sibling, 1 reply; 21+ messages in thread
From: Qais Yousef @ 2019-05-05 11:04 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Srinivas Ramana, duyuchao, Manjo Raja Rao, Karim Yaghmour,
	Tamir Carmeli, Yonghong Song, Alexei Starovoitov, Brendan Gregg,
	Masami Hiramatsu, Peter Ziljstra, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev

On 05/03/19 09:49, Joel Fernandes wrote:
> On Fri, May 03, 2019 at 01:12:34PM +0100, Qais Yousef wrote:
> > Hi Joel
> > 
> > On 05/02/19 16:49, Joel Fernandes (Google) wrote:
> > > The eBPF based opensnoop tool fails to read the file path string passed
> > > to the do_sys_open function. This is because it is a pointer to
> > > userspace address and causes an -EFAULT when read with
> > > probe_kernel_read. This is not an issue when running the tool on x86 but
> > > is an issue on arm64. This patch adds a new bpf function call based
> > 
> > I just did an experiment and if I use Android 4.9 kernel I indeed fail to see
> > PATH info when running opensnoop. But if I run on 5.1-rc7 opensnoop behaves
> > correctly on arm64.
> > 
> > My guess either a limitation that was fixed on later kernel versions or Android
> > kernel has some strict option/modifications that make this fail?
> 
> Thanks a lot for checking, yes I was testing 4.9 kernel with this patch (pixel 3).
> 
> I am not sure what has changed since then, but I still think it is a good
> idea to make the code more robust against such future issues anyway. In
> particular, we learnt with extensive discussions that user/kernel pointers
> are not necessarily distinguishable purely based on their address.

Yes I wasn't arguing against that. But the commit message is misleading or
needs more explanation at least. I tried 4.9.y stable and arm64 worked on that
too. Why do you think it's an arm64 problem?

--
Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-05 11:04     ` Qais Yousef
@ 2019-05-05 13:29       ` Joel Fernandes
  2019-05-05 14:46         ` Qais Yousef
  0 siblings, 1 reply; 21+ messages in thread
From: Joel Fernandes @ 2019-05-05 13:29 UTC (permalink / raw)
  To: Qais Yousef
  Cc: linux-kernel, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Srinivas Ramana, duyuchao, Manjo Raja Rao, Karim Yaghmour,
	Tamir Carmeli, Yonghong Song, Alexei Starovoitov, Brendan Gregg,
	Masami Hiramatsu, Peter Ziljstra, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev

On Sun, May 05, 2019 at 12:04:24PM +0100, Qais Yousef wrote:
> On 05/03/19 09:49, Joel Fernandes wrote:
> > On Fri, May 03, 2019 at 01:12:34PM +0100, Qais Yousef wrote:
> > > Hi Joel
> > > 
> > > On 05/02/19 16:49, Joel Fernandes (Google) wrote:
> > > > The eBPF based opensnoop tool fails to read the file path string passed
> > > > to the do_sys_open function. This is because it is a pointer to
> > > > userspace address and causes an -EFAULT when read with
> > > > probe_kernel_read. This is not an issue when running the tool on x86 but
> > > > is an issue on arm64. This patch adds a new bpf function call based
> > > 
> > > I just did an experiment and if I use Android 4.9 kernel I indeed fail to see
> > > PATH info when running opensnoop. But if I run on 5.1-rc7 opensnoop behaves
> > > correctly on arm64.
> > > 
> > > My guess either a limitation that was fixed on later kernel versions or Android
> > > kernel has some strict option/modifications that make this fail?
> > 
> > Thanks a lot for checking, yes I was testing 4.9 kernel with this patch (pixel 3).
> > 
> > I am not sure what has changed since then, but I still think it is a good
> > idea to make the code more robust against such future issues anyway. In
> > particular, we learnt with extensive discussions that user/kernel pointers
> > are not necessarily distinguishable purely based on their address.
> 
> Yes I wasn't arguing against that. But the commit message is misleading or
> needs more explanation at least. I tried 4.9.y stable and arm64 worked on that
> too. Why do you think it's an arm64 problem?

Well it is broken on at least on at least one arm64 device and the patch I
sent fixes it. We know that the bpf is using wrong kernel API so why not fix
it? Are you saying we should not fix it like in this patch? Or do you have
another fix in mind?

thanks.


> 
> --
> Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-05  7:19 ` Alexei Starovoitov
@ 2019-05-05 13:33   ` Joel Fernandes
  0 siblings, 0 replies; 21+ messages in thread
From: Joel Fernandes @ 2019-05-05 13:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: LKML, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Qais Yousef, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Peter Ziljstra, Steven Rostedt,
	Kees Cook, Android Kernel Team, Daniel Borkmann, Ingo Molnar,
	Network Development

On Sun, May 05, 2019 at 12:19:42AM -0700, Alexei Starovoitov wrote:
> On Thu, May 2, 2019 at 1:50 PM Joel Fernandes (Google)
> <joel@joelfernandes.org> wrote:
> >
> > The eBPF based opensnoop tool fails to read the file path string passed
> > to the do_sys_open function. This is because it is a pointer to
> > userspace address and causes an -EFAULT when read with
> > probe_kernel_read. This is not an issue when running the tool on x86 but
> > is an issue on arm64. This patch adds a new bpf function call based
> > which calls the recently proposed probe_user_read function [1].
> > Using this function call from opensnoop fixes the issue on arm64.
> >
> > [1] https://lore.kernel.org/patchwork/patch/1051588/
> ...
> > +BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void *, unsafe_ptr)
> > +{
> > +       int ret;
> > +
> > +       ret = probe_user_read(dst, unsafe_ptr, size);
> > +       if (unlikely(ret < 0))
> > +               memset(dst, 0, size);
> > +
> > +       return ret;
> > +}
> 
> probe_user_read() doesn't exist in bpf-next
> therefore this patch has to wait for the next merge window.
> At the same time we would need to introduce
> bpf_probe_read_kernel() and introduce a load time warning
> for existing bpf_probe_read(), so we can deprecate it eventually.

Ok I will update it accordingly. Agreed. thanks,

 - Joel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-05 13:29       ` Joel Fernandes
@ 2019-05-05 14:46         ` Qais Yousef
  2019-05-05 15:52           ` Joel Fernandes
  0 siblings, 1 reply; 21+ messages in thread
From: Qais Yousef @ 2019-05-05 14:46 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Srinivas Ramana, duyuchao, Manjo Raja Rao, Karim Yaghmour,
	Tamir Carmeli, Yonghong Song, Alexei Starovoitov, Brendan Gregg,
	Masami Hiramatsu, Peter Ziljstra, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev

On 05/05/19 13:29, Joel Fernandes wrote:
> On Sun, May 05, 2019 at 12:04:24PM +0100, Qais Yousef wrote:
> > On 05/03/19 09:49, Joel Fernandes wrote:
> > > On Fri, May 03, 2019 at 01:12:34PM +0100, Qais Yousef wrote:
> > > > Hi Joel
> > > > 
> > > > On 05/02/19 16:49, Joel Fernandes (Google) wrote:
> > > > > The eBPF based opensnoop tool fails to read the file path string passed
> > > > > to the do_sys_open function. This is because it is a pointer to
> > > > > userspace address and causes an -EFAULT when read with
> > > > > probe_kernel_read. This is not an issue when running the tool on x86 but
> > > > > is an issue on arm64. This patch adds a new bpf function call based
> > > > 
> > > > I just did an experiment and if I use Android 4.9 kernel I indeed fail to see
> > > > PATH info when running opensnoop. But if I run on 5.1-rc7 opensnoop behaves
> > > > correctly on arm64.
> > > > 
> > > > My guess either a limitation that was fixed on later kernel versions or Android
> > > > kernel has some strict option/modifications that make this fail?
> > > 
> > > Thanks a lot for checking, yes I was testing 4.9 kernel with this patch (pixel 3).
> > > 
> > > I am not sure what has changed since then, but I still think it is a good
> > > idea to make the code more robust against such future issues anyway. In
> > > particular, we learnt with extensive discussions that user/kernel pointers
> > > are not necessarily distinguishable purely based on their address.
> > 
> > Yes I wasn't arguing against that. But the commit message is misleading or
> > needs more explanation at least. I tried 4.9.y stable and arm64 worked on that
> > too. Why do you think it's an arm64 problem?
> 
> Well it is broken on at least on at least one arm64 device and the patch I
> sent fixes it. We know that the bpf is using wrong kernel API so why not fix
> it? Are you saying we should not fix it like in this patch? Or do you have
> another fix in mind?

Again I have no issue with the new API. But the claim that it's a fix for
a broken arm64 is a big stretch. AFAICT you don't understand the root cause of
why copy_to_user_inatomic() fails in your case. Given that Android 4.9 has
its own patches on top of 4.9 stable, it might be something that was introduced
in one of these patches that breaks opensnoop, and by making it use the new API
you might be simply working around the problem. All I can see is that vanilla
4.9 stable works on arm64.

So I am happy about introducing the new API but not happy with the commit
message or the explanation given in it. Unless you can investigate the root
cause and relate how this fixes it (and not workaround a problem you're
specifically having) I think it's better to introduce this patch as a generic
new API that is more robust to handle reading __user data in BPF and drop
reference to opensnoop failures. They raise more questions and the real
intention of this patch anyway is to provide the new correct way for BPF
programs to read __user data regardless opensnoop fails or not AFAIU.

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-05 14:46         ` Qais Yousef
@ 2019-05-05 15:52           ` Joel Fernandes
  2019-05-05 18:03             ` Joel Fernandes
  0 siblings, 1 reply; 21+ messages in thread
From: Joel Fernandes @ 2019-05-05 15:52 UTC (permalink / raw)
  To: Qais Yousef
  Cc: linux-kernel, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Srinivas Ramana, duyuchao, Manjo Raja Rao, Karim Yaghmour,
	Tamir Carmeli, Yonghong Song, Alexei Starovoitov, Brendan Gregg,
	Masami Hiramatsu, Peter Ziljstra, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev

On Sun, May 05, 2019 at 03:46:08PM +0100, Qais Yousef wrote:
> On 05/05/19 13:29, Joel Fernandes wrote:
> > On Sun, May 05, 2019 at 12:04:24PM +0100, Qais Yousef wrote:
> > > On 05/03/19 09:49, Joel Fernandes wrote:
> > > > On Fri, May 03, 2019 at 01:12:34PM +0100, Qais Yousef wrote:
> > > > > Hi Joel
> > > > > 
> > > > > On 05/02/19 16:49, Joel Fernandes (Google) wrote:
> > > > > > The eBPF based opensnoop tool fails to read the file path string passed
> > > > > > to the do_sys_open function. This is because it is a pointer to
> > > > > > userspace address and causes an -EFAULT when read with
> > > > > > probe_kernel_read. This is not an issue when running the tool on x86 but
> > > > > > is an issue on arm64. This patch adds a new bpf function call based
> > > > > 
> > > > > I just did an experiment and if I use Android 4.9 kernel I indeed fail to see
> > > > > PATH info when running opensnoop. But if I run on 5.1-rc7 opensnoop behaves
> > > > > correctly on arm64.
> > > > > 
> > > > > My guess either a limitation that was fixed on later kernel versions or Android
> > > > > kernel has some strict option/modifications that make this fail?
> > > > 
> > > > Thanks a lot for checking, yes I was testing 4.9 kernel with this patch (pixel 3).
> > > > 
> > > > I am not sure what has changed since then, but I still think it is a good
> > > > idea to make the code more robust against such future issues anyway. In
> > > > particular, we learnt with extensive discussions that user/kernel pointers
> > > > are not necessarily distinguishable purely based on their address.
> > > 
> > > Yes I wasn't arguing against that. But the commit message is misleading or
> > > needs more explanation at least. I tried 4.9.y stable and arm64 worked on that
> > > too. Why do you think it's an arm64 problem?
> > 
> > Well it is broken on at least on at least one arm64 device and the patch I
> > sent fixes it. We know that the bpf is using wrong kernel API so why not fix
> > it? Are you saying we should not fix it like in this patch? Or do you have
> > another fix in mind?
> 
> Again I have no issue with the new API. But the claim that it's a fix for
> a broken arm64 is a big stretch. AFAICT you don't understand the root cause of
> why copy_to_user_inatomic() fails in your case. Given that Android 4.9 has
> its own patches on top of 4.9 stable, it might be something that was introduced
> in one of these patches that breaks opensnoop, and by making it use the new API
> you might be simply working around the problem. All I can see is that vanilla
> 4.9 stable works on arm64.

Agreed that commit message could be improved. I believe issue is something to
do with differences in 4.9 PAN emulation backports. AIUI PAN was introduced
in upstream only in 4.10 so 4.9 needed backports.

I did not root cause this completely because "doing the right thing" fixed
the issue. I will look more closely once I am home.

Thank you.




> So I am happy about introducing the new API but not happy with the commit
> message or the explanation given in it. Unless you can investigate the root
> cause and relate how this fixes it (and not workaround a problem you're
> specifically having) I think it's better to introduce this patch as a generic
> new API that is more robust to handle reading __user data in BPF and drop
> reference to opensnoop failures. They raise more questions and the real
> intention of this patch anyway is to provide the new correct way for BPF
> programs to read __user data regardless opensnoop fails or not AFAIU.
> 
> Cheers
> 
> --
> Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-05 15:52           ` Joel Fernandes
@ 2019-05-05 18:03             ` Joel Fernandes
  2019-05-05 18:51               ` Andrii Nakryiko
                                 ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Joel Fernandes @ 2019-05-05 18:03 UTC (permalink / raw)
  To: Qais Yousef
  Cc: linux-kernel, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Srinivas Ramana, duyuchao, Manjo Raja Rao, Karim Yaghmour,
	Tamir Carmeli, Yonghong Song, Alexei Starovoitov, Brendan Gregg,
	Masami Hiramatsu, Peter Ziljstra, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev, Mark Rutland,
	Will Deacon

On Sun, May 05, 2019 at 03:52:23PM +0000, Joel Fernandes wrote:
> On Sun, May 05, 2019 at 03:46:08PM +0100, Qais Yousef wrote:
> > On 05/05/19 13:29, Joel Fernandes wrote:
> > > On Sun, May 05, 2019 at 12:04:24PM +0100, Qais Yousef wrote:
> > > > On 05/03/19 09:49, Joel Fernandes wrote:
> > > > > On Fri, May 03, 2019 at 01:12:34PM +0100, Qais Yousef wrote:
> > > > > > Hi Joel
> > > > > > 
> > > > > > On 05/02/19 16:49, Joel Fernandes (Google) wrote:
> > > > > > > The eBPF based opensnoop tool fails to read the file path string passed
> > > > > > > to the do_sys_open function. This is because it is a pointer to
> > > > > > > userspace address and causes an -EFAULT when read with
> > > > > > > probe_kernel_read. This is not an issue when running the tool on x86 but
> > > > > > > is an issue on arm64. This patch adds a new bpf function call based
> > > > > > 
> > > > > > I just did an experiment and if I use Android 4.9 kernel I indeed fail to see
> > > > > > PATH info when running opensnoop. But if I run on 5.1-rc7 opensnoop behaves
> > > > > > correctly on arm64.
> > > > > > 
> > > > > > My guess either a limitation that was fixed on later kernel versions or Android
> > > > > > kernel has some strict option/modifications that make this fail?
> > > > > 
> > > > > Thanks a lot for checking, yes I was testing 4.9 kernel with this patch (pixel 3).
> > > > > 
> > > > > I am not sure what has changed since then, but I still think it is a good
> > > > > idea to make the code more robust against such future issues anyway. In
> > > > > particular, we learnt with extensive discussions that user/kernel pointers
> > > > > are not necessarily distinguishable purely based on their address.
> > > > 
> > > > Yes I wasn't arguing against that. But the commit message is misleading or
> > > > needs more explanation at least. I tried 4.9.y stable and arm64 worked on that
> > > > too. Why do you think it's an arm64 problem?
> > > 
> > > Well it is broken on at least on at least one arm64 device and the patch I
> > > sent fixes it. We know that the bpf is using wrong kernel API so why not fix
> > > it? Are you saying we should not fix it like in this patch? Or do you have
> > > another fix in mind?
> > 
> > Again I have no issue with the new API. But the claim that it's a fix for
> > a broken arm64 is a big stretch. AFAICT you don't understand the root cause of
> > why copy_to_user_inatomic() fails in your case. Given that Android 4.9 has
> > its own patches on top of 4.9 stable, it might be something that was introduced
> > in one of these patches that breaks opensnoop, and by making it use the new API
> > you might be simply working around the problem. All I can see is that vanilla
> > 4.9 stable works on arm64.
> 
> Agreed that commit message could be improved. I believe issue is something to
> do with differences in 4.9 PAN emulation backports. AIUI PAN was introduced
> in upstream only in 4.10 so 4.9 needed backports.
> 
> I did not root cause this completely because "doing the right thing" fixed
> the issue. I will look more closely once I am home.
> 
> Thank you.

+Mark, Will since discussion is about arm64 arch code.

The difference between observing the bug and everything just working seems to
be the set_fs(USER_DS) as done by Masami's patch that this patch is based on.
The following diff shows 'ret' as 255 when set_fs(KERN_DS) is used, and then
after we retry with set_fs(USER_DS), the read succeeds.

diff --git a/mm/maccess.c b/mm/maccess.c
index 78f9274dd49d..d3e01a33c712 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -32,9 +32,20 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
 	pagefault_disable();
 	ret = __copy_from_user_inatomic(dst,
 			(__force const void __user *)src, size);
+	trace_printk("KERNEL_DS: __copy_from_user_inatomic: ret=%d\n", ret);
 	pagefault_enable();
 	set_fs(old_fs);
 
+	if (ret) {
+	set_fs(USER_DS);
+	pagefault_disable();
+	ret = __copy_from_user_inatomic(dst,
+			(__force const void __user *)src, size);
+	trace_printk("RETRY WITH USER_DS: __copy_from_user_inatomic: ret=%d\n", ret);
+	pagefault_enable();
+	set_fs(old_fs);
+	}
+
 	return ret ? -EFAULT : 0;
 }
 EXPORT_SYMBOL_GPL(probe_kernel_read);

In initially thought this was because of the addr_limit pointer masking done
by this patch from Mark Rutland "arm64: Use pointer masking to limit uaccess
speculation"

However removing this masking still makes it fail with KERNEL_DS.

Fwiw, I am still curious which other paths in arm64 check the addr_limit
which might make the __copy_from_user_inatomic fail if the set_fs is not
setup correctly.

Either way, I will resubmit the patch with the commit message fixed correctly
as we agreed and also address Alexei's comments.

Thank you!

 - Joel

-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-05 18:03             ` Joel Fernandes
@ 2019-05-05 18:51               ` Andrii Nakryiko
  2019-05-06  0:01               ` Qais Yousef
  2019-05-06 18:35               ` Will Deacon
  2 siblings, 0 replies; 21+ messages in thread
From: Andrii Nakryiko @ 2019-05-05 18:51 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Qais Yousef, linux-kernel, Michal Gregorczyk, Adrian Ratiu,
	Mohammad Husain, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Peter Ziljstra, Steven Rostedt,
	Kees Cook, kernel-team, Daniel Borkmann, Ingo Molnar, Networking,
	Mark Rutland, Will Deacon

On Sun, May 5, 2019 at 11:08 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Sun, May 05, 2019 at 03:52:23PM +0000, Joel Fernandes wrote:
> > On Sun, May 05, 2019 at 03:46:08PM +0100, Qais Yousef wrote:
> > > On 05/05/19 13:29, Joel Fernandes wrote:
> > > > On Sun, May 05, 2019 at 12:04:24PM +0100, Qais Yousef wrote:
> > > > > On 05/03/19 09:49, Joel Fernandes wrote:
> > > > > > On Fri, May 03, 2019 at 01:12:34PM +0100, Qais Yousef wrote:
> > > > > > > Hi Joel
> > > > > > >
> > > > > > > On 05/02/19 16:49, Joel Fernandes (Google) wrote:
> > > > > > > > The eBPF based opensnoop tool fails to read the file path string passed
> > > > > > > > to the do_sys_open function. This is because it is a pointer to
> > > > > > > > userspace address and causes an -EFAULT when read with
> > > > > > > > probe_kernel_read. This is not an issue when running the tool on x86 but
> > > > > > > > is an issue on arm64. This patch adds a new bpf function call based
> > > > > > >
> > > > > > > I just did an experiment and if I use Android 4.9 kernel I indeed fail to see
> > > > > > > PATH info when running opensnoop. But if I run on 5.1-rc7 opensnoop behaves
> > > > > > > correctly on arm64.
> > > > > > >
> > > > > > > My guess either a limitation that was fixed on later kernel versions or Android
> > > > > > > kernel has some strict option/modifications that make this fail?
> > > > > >
> > > > > > Thanks a lot for checking, yes I was testing 4.9 kernel with this patch (pixel 3).
> > > > > >
> > > > > > I am not sure what has changed since then, but I still think it is a good
> > > > > > idea to make the code more robust against such future issues anyway. In
> > > > > > particular, we learnt with extensive discussions that user/kernel pointers
> > > > > > are not necessarily distinguishable purely based on their address.
> > > > >
> > > > > Yes I wasn't arguing against that. But the commit message is misleading or
> > > > > needs more explanation at least. I tried 4.9.y stable and arm64 worked on that
> > > > > too. Why do you think it's an arm64 problem?
> > > >
> > > > Well it is broken on at least on at least one arm64 device and the patch I
> > > > sent fixes it. We know that the bpf is using wrong kernel API so why not fix
> > > > it? Are you saying we should not fix it like in this patch? Or do you have
> > > > another fix in mind?
> > >
> > > Again I have no issue with the new API. But the claim that it's a fix for
> > > a broken arm64 is a big stretch. AFAICT you don't understand the root cause of
> > > why copy_to_user_inatomic() fails in your case. Given that Android 4.9 has
> > > its own patches on top of 4.9 stable, it might be something that was introduced
> > > in one of these patches that breaks opensnoop, and by making it use the new API
> > > you might be simply working around the problem. All I can see is that vanilla
> > > 4.9 stable works on arm64.
> >
> > Agreed that commit message could be improved. I believe issue is something to
> > do with differences in 4.9 PAN emulation backports. AIUI PAN was introduced
> > in upstream only in 4.10 so 4.9 needed backports.
> >
> > I did not root cause this completely because "doing the right thing" fixed
> > the issue. I will look more closely once I am home.
> >
> > Thank you.
>
> +Mark, Will since discussion is about arm64 arch code.
>
> The difference between observing the bug and everything just working seems to
> be the set_fs(USER_DS) as done by Masami's patch that this patch is based on.
> The following diff shows 'ret' as 255 when set_fs(KERN_DS) is used, and then
> after we retry with set_fs(USER_DS), the read succeeds.
>
> diff --git a/mm/maccess.c b/mm/maccess.c
> index 78f9274dd49d..d3e01a33c712 100644
> --- a/mm/maccess.c
> +++ b/mm/maccess.c
> @@ -32,9 +32,20 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
>         pagefault_disable();
>         ret = __copy_from_user_inatomic(dst,
>                         (__force const void __user *)src, size);
> +       trace_printk("KERNEL_DS: __copy_from_user_inatomic: ret=%d\n", ret);
>         pagefault_enable();
>         set_fs(old_fs);
>
> +       if (ret) {
> +       set_fs(USER_DS);
> +       pagefault_disable();
> +       ret = __copy_from_user_inatomic(dst,
> +                       (__force const void __user *)src, size);
> +       trace_printk("RETRY WITH USER_DS: __copy_from_user_inatomic: ret=%d\n", ret);
> +       pagefault_enable();
> +       set_fs(old_fs);
> +       }
> +
>         return ret ? -EFAULT : 0;
>  }
>  EXPORT_SYMBOL_GPL(probe_kernel_read);
>
> In initially thought this was because of the addr_limit pointer masking done
> by this patch from Mark Rutland "arm64: Use pointer masking to limit uaccess
> speculation"
>
> However removing this masking still makes it fail with KERNEL_DS.
>
> Fwiw, I am still curious which other paths in arm64 check the addr_limit
> which might make the __copy_from_user_inatomic fail if the set_fs is not
> setup correctly.
>
> Either way, I will resubmit the patch with the commit message fixed correctly
> as we agreed and also address Alexei's comments.

Can you please also split the sync of tools/include/uapi/linux/bpf.h
into separate commit. Having it along the other changes causes issues
when syncing libbpf code (and headers it depends on) to a github
mirror (github.com/libbpf/libbpf). Thanks!

>
> Thank you!
>
>  - Joel
>
> --
> 2.21.0.1020.gf2820cf01a-goog
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-05 18:03             ` Joel Fernandes
  2019-05-05 18:51               ` Andrii Nakryiko
@ 2019-05-06  0:01               ` Qais Yousef
  2019-05-06 18:35               ` Will Deacon
  2 siblings, 0 replies; 21+ messages in thread
From: Qais Yousef @ 2019-05-06  0:01 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Srinivas Ramana, duyuchao, Manjo Raja Rao, Karim Yaghmour,
	Tamir Carmeli, Yonghong Song, Alexei Starovoitov, Brendan Gregg,
	Masami Hiramatsu, Peter Ziljstra, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev, Mark Rutland,
	Will Deacon

On 05/05/19 14:03, Joel Fernandes wrote:
> On Sun, May 05, 2019 at 03:52:23PM +0000, Joel Fernandes wrote:
> > On Sun, May 05, 2019 at 03:46:08PM +0100, Qais Yousef wrote:
> > > On 05/05/19 13:29, Joel Fernandes wrote:
> > > > On Sun, May 05, 2019 at 12:04:24PM +0100, Qais Yousef wrote:
> > > > > On 05/03/19 09:49, Joel Fernandes wrote:
> > > > > > On Fri, May 03, 2019 at 01:12:34PM +0100, Qais Yousef wrote:
> > > > > > > Hi Joel
> > > > > > > 
> > > > > > > On 05/02/19 16:49, Joel Fernandes (Google) wrote:
> > > > > > > > The eBPF based opensnoop tool fails to read the file path string passed
> > > > > > > > to the do_sys_open function. This is because it is a pointer to
> > > > > > > > userspace address and causes an -EFAULT when read with
> > > > > > > > probe_kernel_read. This is not an issue when running the tool on x86 but
> > > > > > > > is an issue on arm64. This patch adds a new bpf function call based
> > > > > > > 
> > > > > > > I just did an experiment and if I use Android 4.9 kernel I indeed fail to see
> > > > > > > PATH info when running opensnoop. But if I run on 5.1-rc7 opensnoop behaves
> > > > > > > correctly on arm64.
> > > > > > > 
> > > > > > > My guess either a limitation that was fixed on later kernel versions or Android
> > > > > > > kernel has some strict option/modifications that make this fail?
> > > > > > 
> > > > > > Thanks a lot for checking, yes I was testing 4.9 kernel with this patch (pixel 3).
> > > > > > 
> > > > > > I am not sure what has changed since then, but I still think it is a good
> > > > > > idea to make the code more robust against such future issues anyway. In
> > > > > > particular, we learnt with extensive discussions that user/kernel pointers
> > > > > > are not necessarily distinguishable purely based on their address.
> > > > > 
> > > > > Yes I wasn't arguing against that. But the commit message is misleading or
> > > > > needs more explanation at least. I tried 4.9.y stable and arm64 worked on that
> > > > > too. Why do you think it's an arm64 problem?
> > > > 
> > > > Well it is broken on at least on at least one arm64 device and the patch I
> > > > sent fixes it. We know that the bpf is using wrong kernel API so why not fix
> > > > it? Are you saying we should not fix it like in this patch? Or do you have
> > > > another fix in mind?
> > > 
> > > Again I have no issue with the new API. But the claim that it's a fix for
> > > a broken arm64 is a big stretch. AFAICT you don't understand the root cause of
> > > why copy_to_user_inatomic() fails in your case. Given that Android 4.9 has
> > > its own patches on top of 4.9 stable, it might be something that was introduced
> > > in one of these patches that breaks opensnoop, and by making it use the new API
> > > you might be simply working around the problem. All I can see is that vanilla
> > > 4.9 stable works on arm64.
> > 
> > Agreed that commit message could be improved. I believe issue is something to
> > do with differences in 4.9 PAN emulation backports. AIUI PAN was introduced
> > in upstream only in 4.10 so 4.9 needed backports.
> > 
> > I did not root cause this completely because "doing the right thing" fixed
> > the issue. I will look more closely once I am home.
> > 
> > Thank you.
> 
> +Mark, Will since discussion is about arm64 arch code.
> 
> The difference between observing the bug and everything just working seems to
> be the set_fs(USER_DS) as done by Masami's patch that this patch is based on.
> The following diff shows 'ret' as 255 when set_fs(KERN_DS) is used, and then
> after we retry with set_fs(USER_DS), the read succeeds.
> 
> diff --git a/mm/maccess.c b/mm/maccess.c
> index 78f9274dd49d..d3e01a33c712 100644
> --- a/mm/maccess.c
> +++ b/mm/maccess.c
> @@ -32,9 +32,20 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
>  	pagefault_disable();
>  	ret = __copy_from_user_inatomic(dst,
>  			(__force const void __user *)src, size);
> +	trace_printk("KERNEL_DS: __copy_from_user_inatomic: ret=%d\n", ret);
>  	pagefault_enable();
>  	set_fs(old_fs);
>  
> +	if (ret) {
> +	set_fs(USER_DS);
> +	pagefault_disable();
> +	ret = __copy_from_user_inatomic(dst,
> +			(__force const void __user *)src, size);
> +	trace_printk("RETRY WITH USER_DS: __copy_from_user_inatomic: ret=%d\n", ret);
> +	pagefault_enable();
> +	set_fs(old_fs);
> +	}
> +
>  	return ret ? -EFAULT : 0;
>  }
>  EXPORT_SYMBOL_GPL(probe_kernel_read);
> 
> In initially thought this was because of the addr_limit pointer masking done
> by this patch from Mark Rutland "arm64: Use pointer masking to limit uaccess
> speculation"
> 
> However removing this masking still makes it fail with KERNEL_DS.
> 
> Fwiw, I am still curious which other paths in arm64 check the addr_limit
> which might make the __copy_from_user_inatomic fail if the set_fs is not
> setup correctly.

PAN and UAO configs seem to affect its behavior. I lost access to my board to
play with this myself and will have to wait until I get back to the office on
Tuesday to revive it.

> 
> Either way, I will resubmit the patch with the commit message fixed correctly
> as we agreed and also address Alexei's comments.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-02 20:49 [PATCH RFC] bpf: Add support for reading user pointers Joel Fernandes (Google)
  2019-05-03 12:12 ` Qais Yousef
  2019-05-05  7:19 ` Alexei Starovoitov
@ 2019-05-06 14:47 ` Masami Hiramatsu
  2019-05-06 16:14   ` Joel Fernandes
  2 siblings, 1 reply; 21+ messages in thread
From: Masami Hiramatsu @ 2019-05-06 14:47 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Qais Yousef, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Peter Ziljstra, Steven Rostedt,
	Kees Cook, kernel-team, Daniel Borkmann, Ingo Molnar, netdev

Hi Joel,

On Thu,  2 May 2019 16:49:58 -0400
"Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:

> The eBPF based opensnoop tool fails to read the file path string passed
> to the do_sys_open function. This is because it is a pointer to
> userspace address and causes an -EFAULT when read with
> probe_kernel_read. This is not an issue when running the tool on x86 but
> is an issue on arm64. This patch adds a new bpf function call based
> which calls the recently proposed probe_user_read function [1].
> Using this function call from opensnoop fixes the issue on arm64.
> 
> [1] https://lore.kernel.org/patchwork/patch/1051588/

Anyway, this series is still out-of-tree. We have to push this or similar
update into kernel at first. I can resend v7 on the latest -tip tree including
this patch if you update the description.

Thank you,

> 
> Cc: Michal Gregorczyk <michalgr@live.com>
> Cc: Adrian Ratiu <adrian.ratiu@collabora.com>
> Cc: Mohammad Husain <russoue@gmail.com>
> Cc: Qais Yousef <qais.yousef@arm.com>
> Cc: Srinivas Ramana <sramana@codeaurora.org>
> Cc: duyuchao <yuchao.du@unisoc.com>
> Cc: Manjo Raja Rao <linux@manojrajarao.com>
> Cc: Karim Yaghmour <karim.yaghmour@opersys.com>
> Cc: Tamir Carmeli <carmeli.tamir@gmail.com>
> Cc: Yonghong Song <yhs@fb.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Peter Ziljstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: kernel-team@android.com
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  include/uapi/linux/bpf.h       |  7 ++++++-
>  kernel/trace/bpf_trace.c       | 22 ++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  7 ++++++-
>  3 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index e99e3e6f8b37..6fec701eaa46 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -539,6 +539,10 @@ union bpf_attr {
>   *     @mode: operation mode (enum bpf_adj_room_mode)
>   *     @flags: reserved for future use
>   *     Return: 0 on success or negative error code
> + *
> + * int bpf_probe_read_user(void *dst, int size, void *src)
> + *     Read a userspace pointer safely.
> + *     Return: 0 on success or negative error
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -591,7 +595,8 @@ union bpf_attr {
>  	FN(get_socket_uid),		\
>  	FN(set_hash),			\
>  	FN(setsockopt),			\
> -	FN(skb_adjust_room),
> +	FN(skb_adjust_room),		\
> +	FN(probe_read_user),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index dc498b605d5d..1e1a11d9faa8 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -81,6 +81,26 @@ static const struct bpf_func_proto bpf_probe_read_proto = {
>  	.arg3_type	= ARG_ANYTHING,
>  };
>  
> +BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void *, unsafe_ptr)
> +{
> +	int ret;
> +
> +	ret = probe_user_read(dst, unsafe_ptr, size);
> +	if (unlikely(ret < 0))
> +		memset(dst, 0, size);
> +
> +	return ret;
> +}
> +
> +static const struct bpf_func_proto bpf_probe_read_user_proto = {
> +	.func		= bpf_probe_read_user,
> +	.gpl_only	= true,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_CONST_SIZE,
> +	.arg3_type	= ARG_ANYTHING,
> +};
> +
>  BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src,
>  	   u32, size)
>  {
> @@ -459,6 +479,8 @@ static const struct bpf_func_proto *tracing_func_proto(enum bpf_func_id func_id)
>  		return &bpf_map_delete_elem_proto;
>  	case BPF_FUNC_probe_read:
>  		return &bpf_probe_read_proto;
> +	case BPF_FUNC_probe_read_user:
> +		return &bpf_probe_read_user_proto;
>  	case BPF_FUNC_ktime_get_ns:
>  		return &bpf_ktime_get_ns_proto;
>  	case BPF_FUNC_tail_call:
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index e99e3e6f8b37..6fec701eaa46 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -539,6 +539,10 @@ union bpf_attr {
>   *     @mode: operation mode (enum bpf_adj_room_mode)
>   *     @flags: reserved for future use
>   *     Return: 0 on success or negative error code
> + *
> + * int bpf_probe_read_user(void *dst, int size, void *src)
> + *     Read a userspace pointer safely.
> + *     Return: 0 on success or negative error
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -591,7 +595,8 @@ union bpf_attr {
>  	FN(get_socket_uid),		\
>  	FN(set_hash),			\
>  	FN(setsockopt),			\
> -	FN(skb_adjust_room),
> +	FN(skb_adjust_room),		\
> +	FN(probe_read_user),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> -- 
> 2.21.0.593.g511ec345e18-goog
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-06 14:47 ` Masami Hiramatsu
@ 2019-05-06 16:14   ` Joel Fernandes
  0 siblings, 0 replies; 21+ messages in thread
From: Joel Fernandes @ 2019-05-06 16:14 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Michal Gregorczyk, Adrian Ratiu, Mohammad Husain,
	Qais Yousef, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Peter Ziljstra, Steven Rostedt, Kees Cook,
	kernel-team, Daniel Borkmann, Ingo Molnar, netdev

On Mon, May 06, 2019 at 11:47:51PM +0900, Masami Hiramatsu wrote:
> Hi Joel,
> 
> On Thu,  2 May 2019 16:49:58 -0400
> "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> 
> > The eBPF based opensnoop tool fails to read the file path string passed
> > to the do_sys_open function. This is because it is a pointer to
> > userspace address and causes an -EFAULT when read with
> > probe_kernel_read. This is not an issue when running the tool on x86 but
> > is an issue on arm64. This patch adds a new bpf function call based
> > which calls the recently proposed probe_user_read function [1].
> > Using this function call from opensnoop fixes the issue on arm64.
> > 
> > [1] https://lore.kernel.org/patchwork/patch/1051588/
> 
> Anyway, this series is still out-of-tree. We have to push this or similar
> update into kernel at first. I can resend v7 on the latest -tip tree including
> this patch if you update the description.

Sounds good. I also have to split it up and add a deprecation for the old
API. I will get this done today and then you can include them in your series,
thanks! Once I send them, could you CC bpf maintainers on the patches too in
the future? thanks.

 - Joel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-05 18:03             ` Joel Fernandes
  2019-05-05 18:51               ` Andrii Nakryiko
  2019-05-06  0:01               ` Qais Yousef
@ 2019-05-06 18:35               ` Will Deacon
  2019-05-06 20:58                 ` Joel Fernandes
  2 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2019-05-06 18:35 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Qais Yousef, linux-kernel, Michal Gregorczyk, Adrian Ratiu,
	Mohammad Husain, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Peter Ziljstra, Steven Rostedt,
	Kees Cook, kernel-team, Daniel Borkmann, Ingo Molnar, netdev,
	Mark Rutland

Hi Joel,

On Sun, May 05, 2019 at 02:03:13PM -0400, Joel Fernandes wrote:
> +Mark, Will since discussion is about arm64 arch code.
> 
> The difference between observing the bug and everything just working seems to
> be the set_fs(USER_DS) as done by Masami's patch that this patch is based on.
> The following diff shows 'ret' as 255 when set_fs(KERN_DS) is used, and then
> after we retry with set_fs(USER_DS), the read succeeds.
> 
> diff --git a/mm/maccess.c b/mm/maccess.c
> index 78f9274dd49d..d3e01a33c712 100644
> --- a/mm/maccess.c
> +++ b/mm/maccess.c
> @@ -32,9 +32,20 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
>  	pagefault_disable();
>  	ret = __copy_from_user_inatomic(dst,
>  			(__force const void __user *)src, size);
> +	trace_printk("KERNEL_DS: __copy_from_user_inatomic: ret=%d\n", ret);
>  	pagefault_enable();
>  	set_fs(old_fs);
>  
> +	if (ret) {
> +	set_fs(USER_DS);
> +	pagefault_disable();
> +	ret = __copy_from_user_inatomic(dst,
> +			(__force const void __user *)src, size);
> +	trace_printk("RETRY WITH USER_DS: __copy_from_user_inatomic: ret=%d\n", ret);
> +	pagefault_enable();
> +	set_fs(old_fs);
> +	}
> +
>  	return ret ? -EFAULT : 0;
>  }
>  EXPORT_SYMBOL_GPL(probe_kernel_read);
> 
> In initially thought this was because of the addr_limit pointer masking done
> by this patch from Mark Rutland "arm64: Use pointer masking to limit uaccess
> speculation"
> 
> However removing this masking still makes it fail with KERNEL_DS.
> 
> Fwiw, I am still curious which other paths in arm64 check the addr_limit
> which might make the __copy_from_user_inatomic fail if the set_fs is not
> setup correctly.
> 
> Either way, I will resubmit the patch with the commit message fixed correctly
> as we agreed and also address Alexei's comments.

I'm coming at this with no background, so it's tricky to understand exactly
what's going on here. Some questions:

  * Are you seeing a failure with mainline and/or an official stable kernel?
  * What is the failing CPU? (so we can figure out which architectural
    extensions are implemented)
  * Do you have a .config anywhere? Particular, how are ARM64_PAN,
    ARM64_TTBR0_PAN and ARM64_UAO set?
  * Is the address being accessed a user or a kernel address?

If you're trying to dereference a pointer to userspace using
probe_kernel_read(), that clearly isn't going to work.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-06 18:35               ` Will Deacon
@ 2019-05-06 20:58                 ` Joel Fernandes
  2019-05-06 21:57                   ` Qais Yousef
  0 siblings, 1 reply; 21+ messages in thread
From: Joel Fernandes @ 2019-05-06 20:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Qais Yousef, linux-kernel, Michal Gregorczyk, Adrian Ratiu,
	Mohammad Husain, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Peter Ziljstra, Steven Rostedt,
	Kees Cook, kernel-team, Daniel Borkmann, Ingo Molnar, netdev,
	Mark Rutland

On Mon, May 06, 2019 at 07:35:06PM +0100, Will Deacon wrote:
> Hi Joel,
> 
> On Sun, May 05, 2019 at 02:03:13PM -0400, Joel Fernandes wrote:
> > +Mark, Will since discussion is about arm64 arch code.
> > 
> > The difference between observing the bug and everything just working seems to
> > be the set_fs(USER_DS) as done by Masami's patch that this patch is based on.
> > The following diff shows 'ret' as 255 when set_fs(KERN_DS) is used, and then
> > after we retry with set_fs(USER_DS), the read succeeds.
> > 
> > diff --git a/mm/maccess.c b/mm/maccess.c
> > index 78f9274dd49d..d3e01a33c712 100644
> > --- a/mm/maccess.c
> > +++ b/mm/maccess.c
> > @@ -32,9 +32,20 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
> >  	pagefault_disable();
> >  	ret = __copy_from_user_inatomic(dst,
> >  			(__force const void __user *)src, size);
> > +	trace_printk("KERNEL_DS: __copy_from_user_inatomic: ret=%d\n", ret);
> >  	pagefault_enable();
> >  	set_fs(old_fs);
> >  
> > +	if (ret) {
> > +	set_fs(USER_DS);
> > +	pagefault_disable();
> > +	ret = __copy_from_user_inatomic(dst,
> > +			(__force const void __user *)src, size);
> > +	trace_printk("RETRY WITH USER_DS: __copy_from_user_inatomic: ret=%d\n", ret);
> > +	pagefault_enable();
> > +	set_fs(old_fs);
> > +	}
> > +
> >  	return ret ? -EFAULT : 0;
> >  }
> >  EXPORT_SYMBOL_GPL(probe_kernel_read);
> > 
> > In initially thought this was because of the addr_limit pointer masking done
> > by this patch from Mark Rutland "arm64: Use pointer masking to limit uaccess
> > speculation"
> > 
> > However removing this masking still makes it fail with KERNEL_DS.
> > 
> > Fwiw, I am still curious which other paths in arm64 check the addr_limit
> > which might make the __copy_from_user_inatomic fail if the set_fs is not
> > setup correctly.
> > 
> > Either way, I will resubmit the patch with the commit message fixed correctly
> > as we agreed and also address Alexei's comments.
> 
> I'm coming at this with no background, so it's tricky to understand exactly
> what's going on here. Some questions:

No problem, I added you out of the blue so it is quite understandable :)

>   * Are you seeing a failure with mainline and/or an official stable kernel?

This issue is noticed on the Pixel3 kernel (4.9 based):
git clone https://android.googlesource.com/kernel/msm
(branch: android-msm-crosshatch-4.9-q-preview-1)

>   * What is the failing CPU? (so we can figure out which architectural
>     extensions are implemented)
From cpuinfo:
AArch64 Processor rev 12 (aarch64)
(Qualcomm SDM845 SoC). It is a Pixel 3 phone.

>   * Do you have a .config anywhere? Particular, how are ARM64_PAN,
>     ARM64_TTBR0_PAN and ARM64_UAO set?

CONFIG_ARM64_SW_TTBR0_PAN is not set
CONFIG_ARM64_PAN=y
CONFIG_ARM64_UAO=y

I wanted to say I enabled SW_TTBR0_PAN config and also got the same result.

>   * Is the address being accessed a user or a kernel address?

User. It is the second argument of do_sys_open() kernel function. kprobe
gives bpf the pointer which the bpf program dereferences with
probe_kernel_read.

> If you're trying to dereference a pointer to userspace using
> probe_kernel_read(), that clearly isn't going to work.

Ok. Thanks for confirming as well. The existing code has this bug and these
patches fix it.

 - Joel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-06 20:58                 ` Joel Fernandes
@ 2019-05-06 21:57                   ` Qais Yousef
  2019-05-07  9:52                     ` Will Deacon
  0 siblings, 1 reply; 21+ messages in thread
From: Qais Yousef @ 2019-05-06 21:57 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Will Deacon, linux-kernel, Michal Gregorczyk, Adrian Ratiu,
	Mohammad Husain, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Peter Ziljstra, Steven Rostedt,
	Kees Cook, kernel-team, Daniel Borkmann, Ingo Molnar, netdev,
	Mark Rutland

On 05/06/19 16:58, Joel Fernandes wrote:
> > If you're trying to dereference a pointer to userspace using
> > probe_kernel_read(), that clearly isn't going to work.
> 
> Ok. Thanks for confirming as well. The existing code has this bug and these
> patches fix it.

5.1-rc7 and 4.9.173 stable both managed to read the path in do_sys_open() on my
Juno-r2 board using the defconfig in the tree.

--
Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-06 21:57                   ` Qais Yousef
@ 2019-05-07  9:52                     ` Will Deacon
  2019-05-08  2:00                       ` Joel Fernandes
  0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2019-05-07  9:52 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Joel Fernandes, linux-kernel, Michal Gregorczyk, Adrian Ratiu,
	Mohammad Husain, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Peter Ziljstra, Steven Rostedt,
	Kees Cook, kernel-team, Daniel Borkmann, Ingo Molnar, netdev,
	Mark Rutland

On Mon, May 06, 2019 at 10:57:37PM +0100, Qais Yousef wrote:
> On 05/06/19 16:58, Joel Fernandes wrote:
> > > If you're trying to dereference a pointer to userspace using
> > > probe_kernel_read(), that clearly isn't going to work.
> > 
> > Ok. Thanks for confirming as well. The existing code has this bug and these
> > patches fix it.
> 
> 5.1-rc7 and 4.9.173 stable both managed to read the path in do_sys_open() on my
> Juno-r2 board using the defconfig in the tree.

That's not surprising: Juno-r2 only features v8.0 CPUs, so doesn't have PAN
or UAO capabilities. The SoC Joel is talking about is 8.2, so has both of
those.

Here's some background which might help...

PAN (Privileged Access Never) prevents the kernel from inadvertently
accessing userspace and will cause a page (permission) fault if such an
access is made outside of the standard uaccess routines. This means that
in those routines (e.g. get_user()) we have to toggle the PAN state in the
same way that x86 toggles SMAP. This can be expensive and was part of the
motivation for the adoption of things like unsafe_get_user() on x86.

On arm64, we have a set of so-called "translated" memory access instructions
which can be used to perform unprivileged accesses to userspace from within
the kernel even when PAN is enabled. Using these special instructions (e.g.
LDTR) in our uaccess routines can therefore remove the need to toggle PAN.
Sounds great, right? Well, that all falls apart when the uaccess routines
are used on kernel addresses as in probe_kernel_read() because they will
fault when trying to dereference a kernel pointer.

The answer is UAO (User Access Override). When UAO is set, the translated
memory access instructions behave the same as non-translated accesses.
Therefore we can toggle UAO in set_fs() so that it is set when we're using
KERNEL_DS and cleared when we're using USER_DS.

The side-effect of this is that when KERNEL_DS is set on a system that
implements both PAN and UAO, passing a user pointer to our uaccess routines
will return -EFAULT. In other words, set_fs() can be thought of as a
selector between the kernel and user address spaces, which are distinct.

Joel -- does disabling UAO in your .config "fix" the issue? If so, feel
free to use some of the above text in a commit message if it helps to
justify your change.

Cheers,

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] bpf: Add support for reading user pointers
  2019-05-07  9:52                     ` Will Deacon
@ 2019-05-08  2:00                       ` Joel Fernandes
  0 siblings, 0 replies; 21+ messages in thread
From: Joel Fernandes @ 2019-05-08  2:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: Qais Yousef, linux-kernel, Michal Gregorczyk, Adrian Ratiu,
	Mohammad Husain, Srinivas Ramana, duyuchao, Manjo Raja Rao,
	Karim Yaghmour, Tamir Carmeli, Yonghong Song, Alexei Starovoitov,
	Brendan Gregg, Masami Hiramatsu, Peter Ziljstra, Steven Rostedt,
	Kees Cook, kernel-team, Daniel Borkmann, Ingo Molnar, netdev,
	Mark Rutland

On Tue, May 07, 2019 at 10:52:42AM +0100, Will Deacon wrote:
> On Mon, May 06, 2019 at 10:57:37PM +0100, Qais Yousef wrote:
> > On 05/06/19 16:58, Joel Fernandes wrote:
> > > > If you're trying to dereference a pointer to userspace using
> > > > probe_kernel_read(), that clearly isn't going to work.
> > > 
> > > Ok. Thanks for confirming as well. The existing code has this bug and these
> > > patches fix it.
> > 
> > 5.1-rc7 and 4.9.173 stable both managed to read the path in do_sys_open() on my
> > Juno-r2 board using the defconfig in the tree.
> 
> That's not surprising: Juno-r2 only features v8.0 CPUs, so doesn't have PAN
> or UAO capabilities. The SoC Joel is talking about is 8.2, so has both of
> those.
> 
> Here's some background which might help...
> 
> PAN (Privileged Access Never) prevents the kernel from inadvertently
> accessing userspace and will cause a page (permission) fault if such an
> access is made outside of the standard uaccess routines. This means that
> in those routines (e.g. get_user()) we have to toggle the PAN state in the
> same way that x86 toggles SMAP. This can be expensive and was part of the
> motivation for the adoption of things like unsafe_get_user() on x86.
> 
> On arm64, we have a set of so-called "translated" memory access instructions
> which can be used to perform unprivileged accesses to userspace from within
> the kernel even when PAN is enabled. Using these special instructions (e.g.
> LDTR) in our uaccess routines can therefore remove the need to toggle PAN.
> Sounds great, right? Well, that all falls apart when the uaccess routines
> are used on kernel addresses as in probe_kernel_read() because they will
> fault when trying to dereference a kernel pointer.
> 
> The answer is UAO (User Access Override). When UAO is set, the translated
> memory access instructions behave the same as non-translated accesses.
> Therefore we can toggle UAO in set_fs() so that it is set when we're using
> KERNEL_DS and cleared when we're using USER_DS.
> 
> The side-effect of this is that when KERNEL_DS is set on a system that
> implements both PAN and UAO, passing a user pointer to our uaccess routines
> will return -EFAULT. In other words, set_fs() can be thought of as a
> selector between the kernel and user address spaces, which are distinct.
> 
> Joel -- does disabling UAO in your .config "fix" the issue? If so, feel
> free to use some of the above text in a commit message if it helps to
> justify your change.

Disabling CONFIG_ARM64_UAO does "fix" it.

I will use the description above in the commit message as you suggested,
thanks a lot for the explantation of this!

 - Joel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-05-08  2:00 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-02 20:49 [PATCH RFC] bpf: Add support for reading user pointers Joel Fernandes (Google)
2019-05-03 12:12 ` Qais Yousef
2019-05-03 13:49   ` Joel Fernandes
2019-05-03 13:54     ` Peter Zijlstra
2019-05-03 15:09       ` Joel Fernandes
2019-05-05 11:04     ` Qais Yousef
2019-05-05 13:29       ` Joel Fernandes
2019-05-05 14:46         ` Qais Yousef
2019-05-05 15:52           ` Joel Fernandes
2019-05-05 18:03             ` Joel Fernandes
2019-05-05 18:51               ` Andrii Nakryiko
2019-05-06  0:01               ` Qais Yousef
2019-05-06 18:35               ` Will Deacon
2019-05-06 20:58                 ` Joel Fernandes
2019-05-06 21:57                   ` Qais Yousef
2019-05-07  9:52                     ` Will Deacon
2019-05-08  2:00                       ` Joel Fernandes
2019-05-05  7:19 ` Alexei Starovoitov
2019-05-05 13:33   ` Joel Fernandes
2019-05-06 14:47 ` Masami Hiramatsu
2019-05-06 16:14   ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).