* [PATCH 0/2 v2] [GIT PULL (take two)] tracing: Two more fixes @ 2019-02-15 17:47 Steven Rostedt 2019-02-15 17:47 ` [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault Steven Rostedt 2019-02-15 17:47 ` [PATCH 2/2 v2] tracing: Fix number of entries in trace header Steven Rostedt 0 siblings, 2 replies; 85+ messages in thread From: Steven Rostedt @ 2019-02-15 17:47 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds, Ingo Molnar, Andrew Morton Linus, Two more tracing fixes - Have kprobes not use copy_from_user() to access kernel addresses, because kprobes can legitimately poke at bad kernel memory, which will fault. Copy from user code should never fault in kernel space. Using probe_mem_read() can handle kernel address space faulting. - Put back the entries counter in the tracing output that was accidentally removed. Please pull the latest trace-v5.0-rc4-3 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git trace-v5.0-rc4-3 Tag SHA1: 633d24cef06bed019a34e84ff82c20782449ee5c Head SHA1: 9e7382153f80ba45a0bbcd540fb77d4b15f6e966 Changbin Du (1): kprobe: Do not use uaccess functions to access kernel memory that can fault Quentin Perret (1): tracing: Fix number of entries in trace header ---- kernel/trace/trace.c | 2 ++ kernel/trace/trace_kprobe.c | 10 +--------- 2 files changed, 3 insertions(+), 9 deletions(-) Changes from v1: - Explain that the kprobe change to probe_mem_read() is because copy_from_user() can not handle bad memory in kernel space, which kprobes can cause. The code itself is unchanged. I tweaked the change log of that patch to clarify that as well. ^ permalink raw reply [flat|nested] 85+ messages in thread
* [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-15 17:47 [PATCH 0/2 v2] [GIT PULL (take two)] tracing: Two more fixes Steven Rostedt @ 2019-02-15 17:47 ` Steven Rostedt 2019-02-15 17:55 ` Linus Torvalds 2019-02-21 7:52 ` Masami Hiramatsu 2019-02-15 17:47 ` [PATCH 2/2 v2] tracing: Fix number of entries in trace header Steven Rostedt 1 sibling, 2 replies; 85+ messages in thread From: Steven Rostedt @ 2019-02-15 17:47 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, stable, Changbin Du From: Changbin Du <changbin.du@gmail.com> The userspace can ask kprobe to intercept strings at any memory address, including invalid kernel address. In this case, fetch_store_strlen() would crash since it uses general usercopy function, and user access functions are no longer allowed to access kernel memory. For example, we can crash the kernel by doing something as below: $ sudo kprobe 'p:do_sys_open +0(+0(%si)):string' [ 103.620391] BUG: GPF in non-whitelisted uaccess (non-canonical address?) [ 103.622104] general protection fault: 0000 [#1] SMP PTI [ 103.623424] CPU: 10 PID: 1046 Comm: cat Not tainted 5.0.0-rc3-00130-gd73aba1-dirty #96 [ 103.625321] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-2-g628b2e6-dirty-20190104_103505-linux 04/01/2014 [ 103.628284] RIP: 0010:process_fetch_insn+0x1ab/0x4b0 [ 103.629518] Code: 10 83 80 28 2e 00 00 01 31 d2 31 ff 48 8b 74 24 28 eb 0c 81 fa ff 0f 00 00 7f 1c 85 c0 75 18 66 66 90 0f ae e8 48 63 ca 89 f8 <8a> 0c 31 66 66 90 83 c2 01 84 c9 75 dc 89 54 24 34 89 44 24 28 48 [ 103.634032] RSP: 0018:ffff88845eb37ce0 EFLAGS: 00010246 [ 103.635312] RAX: 0000000000000000 RBX: ffff888456c4e5a8 RCX: 0000000000000000 [ 103.637057] RDX: 0000000000000000 RSI: 2e646c2f6374652f RDI: 0000000000000000 [ 103.638795] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 103.640556] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 103.642297] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 103.644040] FS: 0000000000000000(0000) GS:ffff88846f000000(0000) knlGS:0000000000000000 [ 103.646019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 103.647436] CR2: 00007ffc79758038 CR3: 0000000463360006 CR4: 0000000000020ee0 [ 103.649147] Call Trace: [ 103.649781] ? sched_clock_cpu+0xc/0xa0 [ 103.650747] ? do_sys_open+0x5/0x220 [ 103.651635] kprobe_trace_func+0x303/0x380 [ 103.652645] ? do_sys_open+0x5/0x220 [ 103.653528] kprobe_dispatcher+0x45/0x50 [ 103.654682] ? do_sys_open+0x1/0x220 [ 103.655875] kprobe_ftrace_handler+0x90/0xf0 [ 103.657282] ftrace_ops_assist_func+0x54/0xf0 [ 103.658564] ? __call_rcu+0x1dc/0x280 [ 103.659482] 0xffffffffc00000bf [ 103.660384] ? __ia32_sys_open+0x20/0x20 [ 103.661682] ? do_sys_open+0x1/0x220 [ 103.662863] do_sys_open+0x5/0x220 [ 103.663988] do_syscall_64+0x60/0x210 [ 103.665201] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 103.666862] RIP: 0033:0x7fc22fadccdd [ 103.668034] Code: 48 89 54 24 e0 41 83 e2 40 75 32 89 f0 25 00 00 41 00 3d 00 00 41 00 74 24 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 33 f3 c3 66 0f 1f 84 00 00 00 00 00 48 8d 44 [ 103.674029] RSP: 002b:00007ffc7972c3a8 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 [ 103.676512] RAX: ffffffffffffffda RBX: 0000562f86147a21 RCX: 00007fc22fadccdd [ 103.678853] RDX: 0000000000080000 RSI: 00007fc22fae1428 RDI: 00000000ffffff9c [ 103.681151] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000 [ 103.683489] R10: 0000000000000000 R11: 0000000000000287 R12: 00007fc22fce90a8 [ 103.685774] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [ 103.688056] Modules linked in: [ 103.689131] ---[ end trace 43792035c28984a1 ]--- This can be fixed by using probe_mem_read() instead, as it can handle faulting kernel memory addresses, which kprobes can legitimately do. Link: http://lkml.kernel.org/r/20190125151051.7381-1-changbin.du@gmail.com Cc: stable@vger.kernel.org Fixes: 9da3f2b7405 ("x86/fault: BUG() when uaccess helpers fault on kernel addresses") Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> --- kernel/trace/trace_kprobe.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index d5fb09ebba8b..9eaf07f99212 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -861,22 +861,14 @@ static const struct file_operations kprobe_profile_ops = { static nokprobe_inline int fetch_store_strlen(unsigned long addr) { - mm_segment_t old_fs; int ret, len = 0; u8 c; - old_fs = get_fs(); - set_fs(KERNEL_DS); - pagefault_disable(); - do { - ret = __copy_from_user_inatomic(&c, (u8 *)addr + len, 1); + ret = probe_mem_read(&c, (u8 *)addr + len, 1); len++; } while (c && ret == 0 && len < MAX_STRING_SIZE); - pagefault_enable(); - set_fs(old_fs); - return (ret < 0) ? ret : len; } -- 2.20.1 ^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-15 17:47 ` [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault Steven Rostedt @ 2019-02-15 17:55 ` Linus Torvalds 2019-02-15 22:15 ` Steven Rostedt 2019-02-21 7:52 ` Masami Hiramatsu 1 sibling, 1 reply; 85+ messages in thread From: Linus Torvalds @ 2019-02-15 17:55 UTC (permalink / raw) To: Steven Rostedt Cc: Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du On Fri, Feb 15, 2019 at 9:49 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > From: Changbin Du <changbin.du@gmail.com> > > The userspace can ask kprobe to intercept strings at any memory address, > including invalid kernel address. Again, this is not about a "kernel address" at all. It's neither a kernel address _nor_ a user address. It's an invalid address entirely, and there is nothing that makes it "kernel". Please don't talk about "invalid kernel addresses" when it is no such thing. Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-15 17:55 ` Linus Torvalds @ 2019-02-15 22:15 ` Steven Rostedt 2019-02-15 23:49 ` Andy Lutomirski 0 siblings, 1 reply; 85+ messages in thread From: Steven Rostedt @ 2019-02-15 22:15 UTC (permalink / raw) To: Linus Torvalds Cc: Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Fri, 15 Feb 2019 09:55:32 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Feb 15, 2019 at 9:49 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > From: Changbin Du <changbin.du@gmail.com> > > > > The userspace can ask kprobe to intercept strings at any memory address, > > including invalid kernel address. > > Again, this is not about a "kernel address" at all. > > It's neither a kernel address _nor_ a user address. It's an invalid > address entirely, and there is nothing that makes it "kernel". > > Please don't talk about "invalid kernel addresses" when it is no such thing. > Ah, I see what you are saying. The example of the bug that the patch fixed used a non-canonical address, which is "garbage", and not kernel or user space. Point taken. But the issue is deeper than that. This code never bugged until the following commit was added: 9da3f2b7405 "x86/fault: BUG() when uaccess helpers fault on kernel addresses" Before that commit, this worked fine. In fact, we can trigger the same bug (with a slightly different message), if we use a kernel space address. To test this, I added the following variable somewhere in the code: void *sdr_ptr = 0xffffffff70112200; And then did the following: # echo 'p:fault do_sys_open arg=+0(@sdr_ptr):string' > /debug/tracing/kprobe_events Which will read the sdr_ptr variable just like the example did with +0(+0(%si)):string. But this time I control the what address it is triggered on. And it crashed with: [ 1447.876799] BUG: pagefault on kernel address 0xffffffff70112200 in non-whitelisted uaccess The above message in the bug in the patch was: "BUG: GPF in non-whitelisted uaccess (non-canonical address?)" [ 1447.885053] BUG: unable to handle kernel paging request at ffffffff70112200 [ 1447.891997] #PF error: [normal kernel read fault] [ 1447.896695] PGD 8a212067 P4D 8a212067 PUD 0 [ 1447.900679] BUG: pagefault on kernel address 0xffffffff70112200 in non-whitelisted uaccess [ 1447.900967] Oops: 0000 [#1] PREEMPT SMP PTI [ 1447.913394] CPU: 7 PID: 1688 Comm: cat Not tainted 5.0.0-rc5-test+ #28 [ 1447.919905] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 [ 1447.928843] RIP: 0010:process_fetch_insn+0x1a0/0x470 [ 1447.933804] Code: ff f0 80 48 03 80 83 80 18 1a 00 00 01 31 c9 eb 10 48 83 c2 01 85 c0 75 1f 81 f9 ff 0f 00 00 7f 17 0f 1f 00 0f ae e8 44 89 e0 <40> 8a 32 0f 1f 00 83 c1 01 40 84 f6 75 d9 65 48 8b 14 25 00 5c 01 [ 1447.952520] RSP: 0018:ffffb77b80673d08 EFLAGS: 00010246 [ 1447.957736] RAX: 0000000000000000 RBX: ffff91bfb7ab2c30 RCX: 0000000000000000 [ 1447.964857] RDX: ffffffff70112200 RSI: ffffffff97267a80 RDI: 00007ffffffff000 [ 1447.971981] RBP: 0000000000000000 R08: ffffffff70112200 R09: ffff91c014518000 [ 1447.979105] R10: ffff91c01451801c R11: ffff91c0185e5800 R12: 0000000000000000 [ 1447.986226] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1447.993342] FS: 0000000000000000(0000) GS:ffff91c01a9c0000(0000) knlGS:0000000000000000 [ 1448.001417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1448.007157] CR2: ffffffff70112200 CR3: 00000000b3f32004 CR4: 00000000001606e0 [ 1448.014280] Call Trace: [ 1448.016737] kprobe_trace_func+0x278/0x360 [ 1448.020834] ? preempt_count_sub+0x98/0xe0 [ 1448.024931] ? do_sys_open+0x5/0x220 [ 1448.028503] kprobe_dispatcher+0x36/0x50 [ 1448.032426] ? do_sys_open+0x1/0x220 [ 1448.036002] kprobe_ftrace_handler+0xb5/0x120 [ 1448.040356] ftrace_ops_assist_func+0x87/0x110 [ 1448.044797] 0xffffffffc06a30bf [ 1448.047939] ? __ia32_sys_open+0x20/0x20 [ 1448.051860] ? do_syscall_64+0x1c/0x160 [ 1448.055694] ? do_sys_open+0x1/0x220 [ 1448.059268] do_sys_open+0x5/0x220 [ 1448.062672] do_syscall_64+0x60/0x160 [ 1448.066335] entry_SYSCALL_64_after_hwframe+0x49/0xbe Which triggered on the address: 0xffffffff70112200 Which is perfectly canonical. The reason I use "kernel address" is because this became an issue after "x86/fault: BUG() when uaccess helpers fault on kernel addresses" was added and that explicitly states "kernel address". That patch added: + /* + * This is a faulting memory access in kernel space, on a kernel + * address, in a usercopy function. This can e.g. be caused by improper + * use of helpers like __put_user and by improper attempts to access + * userspace addresses in KERNEL_DS regions. + * The one (semi-)legitimate exception are probe_kernel_{read,write}(), + * which can be invoked from places like kgdb, /dev/mem (for reading) + * and privileged BPF code (for reading). + * The probe_kernel_*() functions set the kernel_uaccess_faults_ok flag + * to tell us that faulting on kernel addresses, and even noncanonical + * addresses, in a userspace accessor does not necessarily imply a + * kernel bug, root might just be doing weird stuff. + */ + if (current->kernel_uaccess_faults_ok) + return false; + + /* This is bad. Refuse the fixup so that we go into die(). */ + if (trapnr == X86_TRAP_PF) { + pr_emerg("BUG: pagefault on kernel address 0x%lx in non-whitelisted uaccess\n", + fault_addr); + } else { + pr_emerg("BUG: GPF in non-whitelisted uaccess (non-canonical address?)\n"); + } Where this path triggered by the kprobe using copy_from_user(). So yeah, it can happen if it triggered on a non-canonical address (which is just garbage), but it can also trigger if it's a kernel address that isn't mapped either. So the comment in the code is not 100% accurate, because it isn't just a kernel address, but also a non-canonical one. Something tells me that the change log of the commit that checks this isn't written well either. What exactly can't be done now with uaccess code? I'm guessing that it should only be allowed to fault on user space addresses? So should I change this commit log to: kprobe: Do not use uaccess function to access non-user address that can fault And change all the "kernel address" mentions to "non-user address" as non-user covers both kernel address and non-canonical? -- Steve This patch allows me to modify the sdr_ptr as well from the tracing directory: diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2cf3c747a357..292fe2ef0a45 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7928,6 +7928,45 @@ static const struct file_operations buffer_percent_fops = { .llseek = default_llseek, }; +void *sdr_ptr = 0xffffffff70112200; + +static ssize_t +sdr_ptr_read(struct file *filp, char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + char buf[64]; + int r; + + r = sprintf(buf, "%px\n", sdr_ptr); + + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); +} + +static ssize_t +sdr_ptr_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + unsigned long val; + int ret; + + ret = kstrtoul_from_user(ubuf, cnt, 0, &val); + if (ret) + return ret; + + sdr_ptr = val; + + (*ppos)++; + + return cnt; +} + +static const struct file_operations sdr_ptr_fops = { + .open = tracing_open_generic, + .read = sdr_ptr_read, + .write = sdr_ptr_write, + .llseek = default_llseek, +}; + struct dentry *trace_instance_dir; static void @@ -8188,6 +8227,9 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) struct trace_event_file *file; int cpu; + trace_create_file("sdr_ptr", 0644, d_tracer, + NULL, &sdr_ptr_fops); + trace_create_file("available_tracers", 0444, d_tracer, tr, &show_traces_fops); ^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-15 22:15 ` Steven Rostedt @ 2019-02-15 23:49 ` Andy Lutomirski 2019-02-16 0:19 ` Steven Rostedt 0 siblings, 1 reply; 85+ messages in thread From: Andy Lutomirski @ 2019-02-15 23:49 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski > On Feb 15, 2019, at 2:15 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > On Fri, 15 Feb 2019 09:55:32 -0800 > Linus Torvalds <torvalds@linux-foundation.org> wrote: > >>> On Fri, Feb 15, 2019 at 9:49 AM Steven Rostedt <rostedt@goodmis.org> wrote: >>> >>> From: Changbin Du <changbin.du@gmail.com> >>> >>> The userspace can ask kprobe to intercept strings at any memory address, >>> including invalid kernel address. >> >> Again, this is not about a "kernel address" at all. >> >> It's neither a kernel address _nor_ a user address. It's an invalid >> address entirely, and there is nothing that makes it "kernel". >> >> Please don't talk about "invalid kernel addresses" when it is no such thing. >> > > Ah, I see what you are saying. The example of the bug that the patch > fixed used a non-canonical address, which is "garbage", and not kernel > or user space. Point taken. > > But the issue is deeper than that. This code never bugged until the > following commit was added: > > 9da3f2b7405 "x86/fault: BUG() when uaccess helpers fault on kernel addresses" > > Before that commit, this worked fine. > > In fact, we can trigger the same bug (with a slightly different > message), if we use a kernel space address. > > To test this, I added the following variable somewhere in the code: > > void *sdr_ptr = 0xffffffff70112200; > > And then did the following: > > # echo 'p:fault do_sys_open arg=+0(@sdr_ptr):string' > /debug/tracing/kprobe_events > > Which will read the sdr_ptr variable just like the example did with +0(+0(%si)):string. > But this time I control the what address it is triggered on. > > And it crashed with: > > [ 1447.876799] BUG: pagefault on kernel address 0xffffffff70112200 in non-whitelisted uaccess > > The above message in the bug in the patch was: > "BUG: GPF in non-whitelisted uaccess (non-canonical address?)" > > [ 1447.885053] BUG: unable to handle kernel paging request at ffffffff70112200 > [ 1447.891997] #PF error: [normal kernel read fault] > [ 1447.896695] PGD 8a212067 P4D 8a212067 PUD 0 > [ 1447.900679] BUG: pagefault on kernel address 0xffffffff70112200 in non-whitelisted uaccess > [ 1447.900967] Oops: 0000 [#1] PREEMPT SMP PTI > [ 1447.913394] CPU: 7 PID: 1688 Comm: cat Not tainted 5.0.0-rc5-test+ #28 > [ 1447.919905] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 > [ 1447.928843] RIP: 0010:process_fetch_insn+0x1a0/0x470 > [ 1447.933804] Code: ff f0 80 48 03 80 83 80 18 1a 00 00 01 31 c9 eb 10 48 83 c2 01 85 c0 75 1f 81 f9 ff 0f 00 00 7f 17 0f 1f 00 0f ae e8 44 89 e0 <40> 8a 32 0f 1f 00 83 c1 01 40 84 f6 75 d9 65 48 8b 14 25 00 5c 01 > [ 1447.952520] RSP: 0018:ffffb77b80673d08 EFLAGS: 00010246 > [ 1447.957736] RAX: 0000000000000000 RBX: ffff91bfb7ab2c30 RCX: 0000000000000000 > [ 1447.964857] RDX: ffffffff70112200 RSI: ffffffff97267a80 RDI: 00007ffffffff000 > [ 1447.971981] RBP: 0000000000000000 R08: ffffffff70112200 R09: ffff91c014518000 > [ 1447.979105] R10: ffff91c01451801c R11: ffff91c0185e5800 R12: 0000000000000000 > [ 1447.986226] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 1447.993342] FS: 0000000000000000(0000) GS:ffff91c01a9c0000(0000) knlGS:0000000000000000 > [ 1448.001417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1448.007157] CR2: ffffffff70112200 CR3: 00000000b3f32004 CR4: 00000000001606e0 > [ 1448.014280] Call Trace: > [ 1448.016737] kprobe_trace_func+0x278/0x360 > [ 1448.020834] ? preempt_count_sub+0x98/0xe0 > [ 1448.024931] ? do_sys_open+0x5/0x220 > [ 1448.028503] kprobe_dispatcher+0x36/0x50 > [ 1448.032426] ? do_sys_open+0x1/0x220 > [ 1448.036002] kprobe_ftrace_handler+0xb5/0x120 > [ 1448.040356] ftrace_ops_assist_func+0x87/0x110 > [ 1448.044797] 0xffffffffc06a30bf > [ 1448.047939] ? __ia32_sys_open+0x20/0x20 > [ 1448.051860] ? do_syscall_64+0x1c/0x160 > [ 1448.055694] ? do_sys_open+0x1/0x220 > [ 1448.059268] do_sys_open+0x5/0x220 > [ 1448.062672] do_syscall_64+0x60/0x160 > [ 1448.066335] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > Which triggered on the address: 0xffffffff70112200 > > Which is perfectly canonical. > > The reason I use "kernel address" is because this became an issue after > "x86/fault: BUG() when uaccess helpers fault on kernel addresses" was > added and that explicitly states "kernel address". > > That patch added: > > + /* > + * This is a faulting memory access in kernel space, on a kernel > + * address, in a usercopy function. This can e.g. be caused by improper > + * use of helpers like __put_user and by improper attempts to access > + * userspace addresses in KERNEL_DS regions. > + * The one (semi-)legitimate exception are probe_kernel_{read,write}(), > + * which can be invoked from places like kgdb, /dev/mem (for reading) > + * and privileged BPF code (for reading). > + * The probe_kernel_*() functions set the kernel_uaccess_faults_ok flag > + * to tell us that faulting on kernel addresses, and even noncanonical > + * addresses, in a userspace accessor does not necessarily imply a > + * kernel bug, root might just be doing weird stuff. > + */ > + if (current->kernel_uaccess_faults_ok) > + return false; > + > + /* This is bad. Refuse the fixup so that we go into die(). */ > + if (trapnr == X86_TRAP_PF) { > + pr_emerg("BUG: pagefault on kernel address 0x%lx in non-whitelisted uaccess\n", > + fault_addr); > + } else { > + pr_emerg("BUG: GPF in non-whitelisted uaccess (non-canonical address?)\n"); > + } > > Where this path triggered by the kprobe using copy_from_user(). So > yeah, it can happen if it triggered on a non-canonical address (which is > just garbage), but it can also trigger if it's a kernel address that > isn't mapped either. > > So the comment in the code is not 100% accurate, because it isn't just > a kernel address, but also a non-canonical one. Something tells me that > the change log of the commit that checks this isn't written well > either. What exactly can't be done now with uaccess code? I'm guessing > that it should only be allowed to fault on user space addresses? So > should I change this commit log to: > > kprobe: Do not use uaccess function to access non-user address that can fault > > And change all the "kernel address" mentions to "non-user address" as > non-user covers both kernel address and non-canonical? > > > -- Steve > > This patch allows me to modify the sdr_ptr as well from the tracing directory: > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 2cf3c747a357..292fe2ef0a45 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -7928,6 +7928,45 @@ static const struct file_operations buffer_percent_fops = { > .llseek = default_llseek, > }; > > +void *sdr_ptr = 0xffffffff70112200; > + > +static ssize_t > +sdr_ptr_read(struct file *filp, char __user *ubuf, > + size_t cnt, loff_t *ppos) > +{ > + char buf[64]; > + int r; > + > + r = sprintf(buf, "%px\n", sdr_ptr); > + > + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); > +} > + > +static ssize_t > +sdr_ptr_write(struct file *filp, const char __user *ubuf, > + size_t cnt, loff_t *ppos) > +{ > + unsigned long val; > + int ret; > + > + ret = kstrtoul_from_user(ubuf, cnt, 0, &val); > + if (ret) > + return ret; > + > + sdr_ptr = val; > + > + (*ppos)++; > + > + return cnt; > +} > + > +static const struct file_operations sdr_ptr_fops = { > + .open = tracing_open_generic, > + .read = sdr_ptr_read, > + .write = sdr_ptr_write, > + .llseek = default_llseek, > +}; > + > struct dentry *trace_instance_dir; > > static void > @@ -8188,6 +8227,9 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) > struct trace_event_file *file; > int cpu; > > + trace_create_file("sdr_ptr", 0644, d_tracer, > + NULL, &sdr_ptr_fops); > + > trace_create_file("available_tracers", 0444, d_tracer, > tr, &show_traces_fops); > I’m missing most of the context here, but even probe_kernel_...() is unwise for a totally untrustworthy address. It could be MMIO, for example. If needed, we could come up with a safe-ish helper for tracing. For direct-map addresses, probe_kernel_...() is probably okay. Same for the current stack. Otherwise we could walk the page tables and check that the address is cacheable, I suppose, although this is slightly dubious if we don’t also check MTRRs. We could also check that the PA is in main memory, I suppose, although this may have unfortunate interactions with the MCE code. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-15 23:49 ` Andy Lutomirski @ 2019-02-16 0:19 ` Steven Rostedt 2019-02-16 1:32 ` Andy Lutomirski 2019-02-18 17:58 ` Linus Torvalds 0 siblings, 2 replies; 85+ messages in thread From: Steven Rostedt @ 2019-02-16 0:19 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Fri, 15 Feb 2019 15:49:35 -0800 Andy Lutomirski <luto@amacapital.net> wrote: > I’m missing most of the context here, but even probe_kernel_...() is > unwise for a totally untrustworthy address. It could be MMIO, for > example. True, but kprobes are used like modules, and only allowed by root. They are used to poke literally anywhere one wants. That's the entire purpose of kprobes. > > If needed, we could come up with a safe-ish helper for tracing. For > direct-map addresses, probe_kernel_...() is probably okay. Same for > the current stack. Otherwise we could walk the page tables and check > that the address is cacheable, I suppose, although this is slightly > dubious if we don’t also check MTRRs. We could also check that the PA > is in main memory, I suppose, although this may have unfortunate > interactions with the MCE code. I added you just because I wanted help getting the change log correct, as that's what Linus was complaining about. I kept using "kernel address" when the sample bug used for the patch was really a non-canonical address (as Linus said, it's just garbage. Neither kernel or user space). But I pointed out that this can also bug if the address is canonical and in the kernel address space. The old code didn't complain about non-canonical or kernel address faulting before commit 9da3f2b7405, which only talks about kernel address space faulting (which is why I only mentioned that in my messages). Would changing all the mention of "kernel address" to "non user space" be accurate? For reference: http://lkml.kernel.org/r/20190215174945.557218316@goodmis.org http://lkml.kernel.org/r/20190215142015.860423791@goodmis.org -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-16 0:19 ` Steven Rostedt @ 2019-02-16 1:32 ` Andy Lutomirski 2019-02-16 2:08 ` Steven Rostedt 2019-02-18 17:58 ` Linus Torvalds 1 sibling, 1 reply; 85+ messages in thread From: Andy Lutomirski @ 2019-02-16 1:32 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski > On Feb 15, 2019, at 4:19 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > On Fri, 15 Feb 2019 15:49:35 -0800 > Andy Lutomirski <luto@amacapital.net> wrote: > >> I’m missing most of the context here, but even probe_kernel_...() is >> unwise for a totally untrustworthy address. It could be MMIO, for >> example. > > True, but kprobes are used like modules, and only allowed by root. They > are used to poke literally anywhere one wants. That's the entire > purpose of kprobes. > >> >> If needed, we could come up with a safe-ish helper for tracing. For >> direct-map addresses, probe_kernel_...() is probably okay. Same for >> the current stack. Otherwise we could walk the page tables and check >> that the address is cacheable, I suppose, although this is slightly >> dubious if we don’t also check MTRRs. We could also check that the PA >> is in main memory, I suppose, although this may have unfortunate >> interactions with the MCE code. > > I added you just because I wanted help getting the change log correct, > as that's what Linus was complaining about. I kept using "kernel > address" when the sample bug used for the patch was really a > non-canonical address (as Linus said, it's just garbage. Neither kernel > or user space). But I pointed out that this can also bug if the > address is canonical and in the kernel address space. The old code > didn't complain about non-canonical or kernel address faulting before > commit 9da3f2b7405, which only talks about kernel address space > faulting (which is why I only mentioned that in my messages). > > Would changing all the mention of "kernel address" to "non user space" > be accurate? > I think “kernel address” is right. It’s illegal to access anything that isn’t known to be a valid kernel address while in KERNEL_DS. The old __copy seems likely to have always been a bit bogus. BTW, what is this probe_mem_read() thing? Some minimal inspection suggests it’s a buggy reimplementation of probe_kernel_read(). Can you delete it and just use probe_kernel_read() directly? > For reference: > > http://lkml.kernel.org/r/20190215174945.557218316@goodmis.org > http://lkml.kernel.org/r/20190215142015.860423791@goodmis.org > > -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-16 1:32 ` Andy Lutomirski @ 2019-02-16 2:08 ` Steven Rostedt 2019-02-16 2:14 ` Andy Lutomirski 0 siblings, 1 reply; 85+ messages in thread From: Steven Rostedt @ 2019-02-16 2:08 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Fri, 15 Feb 2019 17:32:55 -0800 Andy Lutomirski <luto@amacapital.net> wrote: > > I added you just because I wanted help getting the change log correct, > > as that's what Linus was complaining about. I kept using "kernel > > address" when the sample bug used for the patch was really a > > non-canonical address (as Linus said, it's just garbage. Neither kernel > > or user space). But I pointed out that this can also bug if the > > address is canonical and in the kernel address space. The old code > > didn't complain about non-canonical or kernel address faulting before > > commit 9da3f2b7405, which only talks about kernel address space > > faulting (which is why I only mentioned that in my messages). > > > > Would changing all the mention of "kernel address" to "non user space" > > be accurate? > > > > I think “kernel address” is right. It’s illegal to access anything that isn’t known to be a valid kernel address while in KERNEL_DS. But an non-canonical address is not a "kernel address", and that will cause a bug too. This patch came about because it was changed that if we do a uaccess on something other than a user space address and take a fault (either because it was a non-canonical address, or a kernel address), we BUG! Where before that one patch, it would just return a fault. > > The old __copy seems likely to have always been a bit bogus. > > BTW, what is this probe_mem_read() thing? Some minimal inspection suggests it’s a buggy reimplementation of probe_kernel_read(). Can you delete it and just use probe_kernel_read() directly? Well, the issue is that we have trace_probe_tmpl.h in that same directory, which does the work for kprobes and uprobes. The trace_kprobes.c defines all the functions for handling kprobes, and trace_uprobes.c does all the handling of uprobes, then they include trace_probe_tmpl.h which does the bulk of the work. In the uprobes case, we have: static nokprobe_inline int probe_mem_read(void *dest, void *src, size_t size) { void __user *vaddr = (void __force __user *)src; return copy_from_user(dest, vaddr, size) ? -EFAULT : 0; } Because that is adding probes on userspace code. -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-16 2:08 ` Steven Rostedt @ 2019-02-16 2:14 ` Andy Lutomirski 2019-02-16 2:21 ` Steven Rostedt 0 siblings, 1 reply; 85+ messages in thread From: Andy Lutomirski @ 2019-02-16 2:14 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski > On Feb 15, 2019, at 6:08 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > On Fri, 15 Feb 2019 17:32:55 -0800 > Andy Lutomirski <luto@amacapital.net> wrote: > >>> I added you just because I wanted help getting the change log correct, >>> as that's what Linus was complaining about. I kept using "kernel >>> address" when the sample bug used for the patch was really a >>> non-canonical address (as Linus said, it's just garbage. Neither kernel >>> or user space). But I pointed out that this can also bug if the >>> address is canonical and in the kernel address space. The old code >>> didn't complain about non-canonical or kernel address faulting before >>> commit 9da3f2b7405, which only talks about kernel address space >>> faulting (which is why I only mentioned that in my messages). >>> >>> Would changing all the mention of "kernel address" to "non user space" >>> be accurate? >>> >> >> I think “kernel address” is right. It’s illegal to access anything that isn’t known to be a valid kernel address while in KERNEL_DS. > > But an non-canonical address is not a "kernel address", and that will > cause a bug too. Indeed. A non-canonical address is not known to be a valid kernel address. I stand by my slightly pedantic statement :) > This patch came about because it was changed that if > we do a uaccess on something other than a user space address and take a > fault (either because it was a non-canonical address, or a kernel > address), we BUG! Where before that one patch, it would just return a > fault. > >> >> The old __copy seems likely to have always been a bit bogus. >> >> BTW, what is this probe_mem_read() thing? Some minimal inspection suggests it’s a buggy reimplementation of probe_kernel_read(). Can you delete it and just use probe_kernel_read() directly? > > Well, the issue is that we have trace_probe_tmpl.h in that same > directory, which does the work for kprobes and uprobes. The > trace_kprobes.c defines all the functions for handling kprobes, and > trace_uprobes.c does all the handling of uprobes, then they include > trace_probe_tmpl.h which does the bulk of the work. > > In the uprobes case, we have: > > static nokprobe_inline int > probe_mem_read(void *dest, void *src, size_t size) > { > void __user *vaddr = (void __force __user *)src; > > return copy_from_user(dest, vaddr, size) ? -EFAULT : 0; > } > > Because that is adding probes on userspace code. > > Can the kprobe case call probe_kernel_read? Maybe it does already? ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-16 2:14 ` Andy Lutomirski @ 2019-02-16 2:21 ` Steven Rostedt 0 siblings, 0 replies; 85+ messages in thread From: Steven Rostedt @ 2019-02-16 2:21 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Fri, 15 Feb 2019 18:14:21 -0800 Andy Lutomirski <luto@amacapital.net> wrote: > > In the uprobes case, we have: > > > > static nokprobe_inline int > > probe_mem_read(void *dest, void *src, size_t size) > > { > > void __user *vaddr = (void __force __user *)src; > > > > return copy_from_user(dest, vaddr, size) ? -EFAULT : 0; > > } > > > > Because that is adding probes on userspace code. > > > > > > Can the kprobe case call probe_kernel_read? Maybe it does already? Yes, the probe_mem_read() is only used in the trace_probe_tmpl.h which for uprobes is the above "copy_from_user()" and for the kprobes case it is probe_kernel_read(). -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-16 0:19 ` Steven Rostedt 2019-02-16 1:32 ` Andy Lutomirski @ 2019-02-18 17:58 ` Linus Torvalds 2019-02-18 18:23 ` Linus Torvalds 1 sibling, 1 reply; 85+ messages in thread From: Linus Torvalds @ 2019-02-18 17:58 UTC (permalink / raw) To: Steven Rostedt Cc: Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski [ Sorry for not looking at this earlier, I have family in town so I was mostly busy over the weekend... ] On Fri, Feb 15, 2019 at 4:19 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > Would changing all the mention of "kernel address" to "non user space" > be accurate? That would have worked, but in the meantime I just decided to pull the existing tag, because it's not _horribly_ misleading any more and now we're just talking details of what is a "kernel address" etc. And the patch itself is better than what we used to have. That said, I do agree with Andy that both the old _copy_from_user_inatomic() thing and the new probe_mem_read() are just fundamentally broken, nasty and slow. It just so happens that probe_mem_read() works _reasonably_ well in practice on x86. Basically, probe_mem_read() -> probe_kernel_read() really only works on true kernel addresses. And some of that is very fundamental: some architectures can have two entirely different address spaces for user and kernel memory, so if you give _any_ function an "try to read this address", it fundamentally has to be one or the other. The fact that on x86, there is a unified address space for kernel/user, and it can work for one or the other, is just happenstance (and admittedly the common case). So I've pulled the existing pull request, because it papers over one particular issue, but the real fix would require: - knowing whether it's kernel or user space you access - actually using that knowledge to then limit the addresses we are willing to probe, and _how_ we probe them. The user-space case is fairly easy: just check the address with "access_ok()", and then use _copy_user_atomic() without any set_fs() games. That should "JustWork(tm)". And if it's truly supposed to be a kernel address, then we probably need to add a "is this possibly a valid kernel data pointer" interface. Before we then do what "probe_kernel_read()" currently does. Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-18 17:58 ` Linus Torvalds @ 2019-02-18 18:23 ` Linus Torvalds 2019-02-19 16:18 ` Steven Rostedt 0 siblings, 1 reply; 85+ messages in thread From: Linus Torvalds @ 2019-02-18 18:23 UTC (permalink / raw) To: Steven Rostedt Cc: Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Mon, Feb 18, 2019 at 9:58 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So I've pulled the existing pull request, because it papers over one > particular issue, but the real fix would require: > > - knowing whether it's kernel or user space you access Actually, it would be good to split the "kernel space" access into finer granularities, because it would be good to limit those addresses by what you expect to happen. For example, on x86-64, we have a separate virtual mapping for kernel text. If you're following a data pointer, you probably shouldn't even look at that area, and just disallow it up-front. And the vmalloc space, we should probably look up the vmalloc descriptor for the address, to make sure we don't go into ioremap'ed areas (but we'd generally *do* want to follow pages for module data, or for vmap'ed stacks). And fixmap and/or percpu pointers may or may not be something that we'd want to follow. Etc. So it would be good to not just say "user or kernel", but actually say what *kind* of kernel access it expects. Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-18 18:23 ` Linus Torvalds @ 2019-02-19 16:18 ` Steven Rostedt 2019-02-19 18:43 ` Linus Torvalds 0 siblings, 1 reply; 85+ messages in thread From: Steven Rostedt @ 2019-02-19 16:18 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, Masami Hiramatsu [ Added Masami too. Start of thread is here: http://lkml.kernel.org/r/20190215174712.372898450@goodmis.org ] On Mon, 18 Feb 2019 10:23:44 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > So it would be good to not just say "user or kernel", but actually say > what *kind* of kernel access it expects. Note, kprobes are a different kind of beast. I've used kprobes to probe userspace information as well as kernel. Heck, I could see someone even using kprobes to probe IO memory to check if a device is doing what they expect it's doing. Basically, a kprobe is mostly used for debugging what's happening in a live kernel, to read any address. But for those that are more security minded, perhaps we could add "layers" via CONFIG or /sys files that prevent kprobes from doing certain kinds of probes? -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-19 16:18 ` Steven Rostedt @ 2019-02-19 18:43 ` Linus Torvalds 2019-02-19 19:03 ` Steven Rostedt 0 siblings, 1 reply; 85+ messages in thread From: Linus Torvalds @ 2019-02-19 18:43 UTC (permalink / raw) To: Steven Rostedt Cc: Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, Masami Hiramatsu On Tue, Feb 19, 2019 at 8:18 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > > So it would be good to not just say "user or kernel", but actually say > > what *kind* of kernel access it expects. > > Note, kprobes are a different kind of beast. I've used kprobes to probe > userspace information as well as kernel. Heck, I could see someone > even using kprobes to probe IO memory to check if a device is doing > what they expect it's doing. Note that even if that is the case, you _need_ to special "user vs kernel" information. Because the exact same address might exist in both. Right now I think that only happens on sparc32, but vendors used to have that issue on x86-32 too (if they had the 4G:4G patches). > Basically, a kprobe is mostly used for debugging what's happening in a > live kernel, to read any address. My point is that "any address" is not sufficient to begin with. You need "kernel or user". Having a flag for what _kind_ of kernel address is ok might then be required for other cases if they might not be ok with following page tables to IO space.. Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-19 18:43 ` Linus Torvalds @ 2019-02-19 19:03 ` Steven Rostedt 2019-02-20 8:10 ` Masami Hiramatsu 0 siblings, 1 reply; 85+ messages in thread From: Steven Rostedt @ 2019-02-19 19:03 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, Masami Hiramatsu On Tue, 19 Feb 2019 10:43:36 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Feb 19, 2019 at 8:18 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > So it would be good to not just say "user or kernel", but actually say > > > what *kind* of kernel access it expects. > > > > Note, kprobes are a different kind of beast. I've used kprobes to probe > > userspace information as well as kernel. Heck, I could see someone > > even using kprobes to probe IO memory to check if a device is doing > > what they expect it's doing. > > Note that even if that is the case, you _need_ to special "user vs > kernel" information. > > Because the exact same address might exist in both. > > Right now I think that only happens on sparc32, but vendors used to > have that issue on x86-32 too (if they had the 4G:4G patches). Hmm, I didn't realize that Linux allowed the same address to be different depending on being in kernel or user space (learn something new everyday). I guess it makes sense, and even easier with the switch of cr3 from user to kernel. And I even knew of 4G:4G, but never used it, nor put too much thought about its implementation. > > > Basically, a kprobe is mostly used for debugging what's happening in a > > live kernel, to read any address. > > My point is that "any address" is not sufficient to begin with. You > need "kernel or user". > > Having a flag for what _kind_ of kernel address is ok might then be > required for other cases if they might not be ok with following page > tables to IO space.. > Good point. Looks like we should add a new flag for kprobe trace parameters, that tell kprobes if the address is expected to be user or kernel. That would be good regardless of the duplicate meanings, as we could use copy_from_user without touching KERNEL_DS, if the probe argument specifically states "this is user space". For example, when probing do_sys_open, and you want to read what path string was passed into the kernel. Masami, thoughts? -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-19 19:03 ` Steven Rostedt @ 2019-02-20 8:10 ` Masami Hiramatsu 2019-02-20 13:57 ` Jann Horn 2019-02-20 14:49 ` Steven Rostedt 0 siblings, 2 replies; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-20 8:10 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, Masami Hiramatsu On Tue, 19 Feb 2019 14:03:30 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > > > Basically, a kprobe is mostly used for debugging what's happening in a > > > live kernel, to read any address. > > > > My point is that "any address" is not sufficient to begin with. You > > need "kernel or user". > > > > Having a flag for what _kind_ of kernel address is ok might then be > > required for other cases if they might not be ok with following page > > tables to IO space.. > > > > Good point. Looks like we should add a new flag for kprobe > trace parameters, that tell kprobes if the address is expected to be > user or kernel. That would be good regardless of the duplicate > meanings, as we could use copy_from_user without touching KERNEL_DS, if > the probe argument specifically states "this is user space". For > example, when probing do_sys_open, and you want to read what path string > was passed into the kernel. > > Masami, thoughts? Let me ensure what you want. So you want to access a "string" in user-space, not a data structure? In that case, it is very easy to me. It is enough to add a "ustring" type to kprobe events. For example, do_sys_opsn's path variable is one example. That will be +0(+0(%si)):ustring, and fetcher finally copy the string using strncpy_from_user() instead of strncpy_from_unsafe(). (*) But if you consider to access a field in a data-structure in user space, it might need some more work (E.g. ioctl's parameter), becase if the __user pointer to the data structure is on the memory, we have to dereference the address inside kernel using probe_kernel_read(), but after getting the data strucutre address, we have to dereference the address with copy_from_user(). At this moment, we have no such strong syntax... To solve that, maybe we need to introduce something like "back reference" of arguments in the event, e.g. p somewhere user_data=+0(%si) field_val=+8(\user_data):u32:user or p somewhere +0(%si) field_val=+8(\1):u32:user This ":user" additional suffix tells kprobe events to change fetching method to fetch the data by copy_from_user(). (*) BTW, there is another concern to use _from_user APIs in kprobe. Are those APIs might sleep?? Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-20 8:10 ` Masami Hiramatsu @ 2019-02-20 13:57 ` Jann Horn 2019-02-20 14:47 ` Steven Rostedt 2019-02-20 15:08 ` Masami Hiramatsu 2019-02-20 14:49 ` Steven Rostedt 1 sibling, 2 replies; 85+ messages in thread From: Jann Horn @ 2019-02-20 13:57 UTC (permalink / raw) To: Masami Hiramatsu Cc: Steven Rostedt, Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Kees Cook, Andy Lutomirski On Wed, Feb 20, 2019 at 9:10 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > On Tue, 19 Feb 2019 14:03:30 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > Basically, a kprobe is mostly used for debugging what's happening in a > > > > live kernel, to read any address. > > > > > > My point is that "any address" is not sufficient to begin with. You > > > need "kernel or user". > > > > > > Having a flag for what _kind_ of kernel address is ok might then be > > > required for other cases if they might not be ok with following page > > > tables to IO space.. > > > > > > > Good point. Looks like we should add a new flag for kprobe > > trace parameters, that tell kprobes if the address is expected to be > > user or kernel. That would be good regardless of the duplicate > > meanings, as we could use copy_from_user without touching KERNEL_DS, if > > the probe argument specifically states "this is user space". For > > example, when probing do_sys_open, and you want to read what path string > > was passed into the kernel. > > > > Masami, thoughts? > > Let me ensure what you want. So you want to access a "string" in user-space, > not a data structure? In that case, it is very easy to me. It is enough to > add a "ustring" type to kprobe events. For example, do_sys_opsn's path > variable is one example. That will be +0(+0(%si)):ustring, and fetcher > finally copy the string using strncpy_from_user() instead of > strncpy_from_unsafe(). (*) [...] > (*) BTW, there is another concern to use _from_user APIs in kprobe. Are those > APIs might sleep?? If you want to access userspace without sleeping, and ignore data in non-present pages, you can do `pagefault_disable(); err = __copy_from_user_inatomic(...); pagefault_enable();`. (Actually, maybe the kernel should have a helper for that...) ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-20 13:57 ` Jann Horn @ 2019-02-20 14:47 ` Steven Rostedt 2019-02-20 15:08 ` Masami Hiramatsu 1 sibling, 0 replies; 85+ messages in thread From: Steven Rostedt @ 2019-02-20 14:47 UTC (permalink / raw) To: Jann Horn Cc: Masami Hiramatsu, Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Kees Cook, Andy Lutomirski On Wed, 20 Feb 2019 14:57:31 +0100 Jann Horn <jannh@google.com> wrote: > > (*) BTW, there is another concern to use _from_user APIs in kprobe. Are those > > APIs might sleep?? > > If you want to access userspace without sleeping, and ignore data in > non-present pages, you can do `pagefault_disable(); err = > __copy_from_user_inatomic(...); pagefault_enable();`. (Actually, maybe > the kernel should have a helper for that...) I was about to say pretty much the same thing (about using _inatomic()), but yeah, we should also have a helper function if we don't already to include the page_fault_disable/enable() too. -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-20 13:57 ` Jann Horn 2019-02-20 14:47 ` Steven Rostedt @ 2019-02-20 15:08 ` Masami Hiramatsu 1 sibling, 0 replies; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-20 15:08 UTC (permalink / raw) To: Jann Horn Cc: Steven Rostedt, Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Kees Cook, Andy Lutomirski Hi Jann, On Wed, 20 Feb 2019 14:57:31 +0100 Jann Horn <jannh@google.com> wrote: > On Wed, Feb 20, 2019 at 9:10 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > On Tue, 19 Feb 2019 14:03:30 -0500 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > > Basically, a kprobe is mostly used for debugging what's happening in a > > > > > live kernel, to read any address. > > > > > > > > My point is that "any address" is not sufficient to begin with. You > > > > need "kernel or user". > > > > > > > > Having a flag for what _kind_ of kernel address is ok might then be > > > > required for other cases if they might not be ok with following page > > > > tables to IO space.. > > > > > > > > > > Good point. Looks like we should add a new flag for kprobe > > > trace parameters, that tell kprobes if the address is expected to be > > > user or kernel. That would be good regardless of the duplicate > > > meanings, as we could use copy_from_user without touching KERNEL_DS, if > > > the probe argument specifically states "this is user space". For > > > example, when probing do_sys_open, and you want to read what path string > > > was passed into the kernel. > > > > > > Masami, thoughts? > > > > Let me ensure what you want. So you want to access a "string" in user-space, > > not a data structure? In that case, it is very easy to me. It is enough to > > add a "ustring" type to kprobe events. For example, do_sys_opsn's path > > variable is one example. That will be +0(+0(%si)):ustring, and fetcher > > finally copy the string using strncpy_from_user() instead of > > strncpy_from_unsafe(). (*) > [...] > > (*) BTW, there is another concern to use _from_user APIs in kprobe. Are those > > APIs might sleep?? > > If you want to access userspace without sleeping, and ignore data in > non-present pages, you can do `pagefault_disable(); err = > __copy_from_user_inatomic(...); pagefault_enable();`. (Actually, maybe > the kernel should have a helper for that...) Ok, we are going back to the start point of this thread :) http://lkml.kernel.org/r/20190215174712.372898450@goodmis.org So, if user tells kprobe it is user-pointer, we check it with access_ok(), and will do something similar to the strnlen_user() and strncpy_from_user(), but using __copy_from_user_inatomic() and pagefault_disable() for kprobes. Thank you! -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-20 8:10 ` Masami Hiramatsu 2019-02-20 13:57 ` Jann Horn @ 2019-02-20 14:49 ` Steven Rostedt 2019-02-20 16:04 ` Masami Hiramatsu 2019-02-22 8:27 ` Masami Hiramatsu 1 sibling, 2 replies; 85+ messages in thread From: Steven Rostedt @ 2019-02-20 14:49 UTC (permalink / raw) To: Masami Hiramatsu Cc: Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Wed, 20 Feb 2019 17:10:19 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > Let me ensure what you want. So you want to access a "string" in user-space, > not a data structure? In that case, it is very easy to me. It is enough to > add a "ustring" type to kprobe events. For example, do_sys_opsn's path > variable is one example. That will be +0(+0(%si)):ustring, and fetcher > finally copy the string using strncpy_from_user() instead of > strncpy_from_unsafe(). (*) ustring would be good. > > But if you consider to access a field in a data-structure in user space, > it might need some more work (E.g. ioctl's parameter), becase if the __user > pointer to the data structure is on the memory, we have to dereference > the address inside kernel using probe_kernel_read(), but after getting > the data strucutre address, we have to dereference the address with copy_from_user(). > At this moment, we have no such strong syntax... > > To solve that, maybe we need to introduce something like "back reference" > of arguments in the event, e.g. > > p somewhere user_data=+0(%si) field_val=+8(\user_data):u32:user > > or > > p somewhere +0(%si) field_val=+8(\1):u32:user > > This ":user" additional suffix tells kprobe events to change fetching method > to fetch the data by copy_from_user(). What about just adding 'u' to the end of the offset? Say you have a data structure in kernel space that has a field in user space you want to reference? field_val=+8u(+0(%si)) Although, I would say having ways to access current parameters may also be a nice touch ;-) -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-20 14:49 ` Steven Rostedt @ 2019-02-20 16:04 ` Masami Hiramatsu 2019-02-20 16:42 ` Steven Rostedt 2019-02-22 8:27 ` Masami Hiramatsu 1 sibling, 1 reply; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-20 16:04 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Wed, 20 Feb 2019 09:49:26 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 20 Feb 2019 17:10:19 +0900 > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > Let me ensure what you want. So you want to access a "string" in user-space, > > not a data structure? In that case, it is very easy to me. It is enough to > > add a "ustring" type to kprobe events. For example, do_sys_opsn's path > > variable is one example. That will be +0(+0(%si)):ustring, and fetcher > > finally copy the string using strncpy_from_user() instead of > > strncpy_from_unsafe(). (*) > > ustring would be good. OK. > > > > > But if you consider to access a field in a data-structure in user space, > > it might need some more work (E.g. ioctl's parameter), becase if the __user > > pointer to the data structure is on the memory, we have to dereference > > the address inside kernel using probe_kernel_read(), but after getting > > the data strucutre address, we have to dereference the address with copy_from_user(). > > At this moment, we have no such strong syntax... > > > > To solve that, maybe we need to introduce something like "back reference" > > of arguments in the event, e.g. > > > > p somewhere user_data=+0(%si) field_val=+8(\user_data):u32:user > > > > or > > > > p somewhere +0(%si) field_val=+8(\1):u32:user > > > > This ":user" additional suffix tells kprobe events to change fetching method > > to fetch the data by copy_from_user(). > > What about just adding 'u' to the end of the offset? Say you have a > data structure in kernel space that has a field in user space you want > to reference? > > > field_val=+8u(+0(%si)) Ah, that looks good :~) thank you for this idea! > > Although, I would say having ways to access current parameters may also > be a nice touch ;-) Hehe, OK, let's implement both. Thanks! > > -- Steve -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-20 16:04 ` Masami Hiramatsu @ 2019-02-20 16:42 ` Steven Rostedt 2019-02-21 7:37 ` Masami Hiramatsu 0 siblings, 1 reply; 85+ messages in thread From: Steven Rostedt @ 2019-02-20 16:42 UTC (permalink / raw) To: Masami Hiramatsu Cc: Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Thu, 21 Feb 2019 01:04:53 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > > What about just adding 'u' to the end of the offset? Say you have a > > data structure in kernel space that has a field in user space you want > > to reference? > > > > > > field_val=+8u(+0(%si)) > > Ah, that looks good :~) thank you for this idea! <bike-shed> Hmm, I wonder if we should make it +u8 or u+8? as +8u may be confused as unsigned? Like 8ULL. I don't know. Kernel developers suck at naming :-p </bike-shed> -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-20 16:42 ` Steven Rostedt @ 2019-02-21 7:37 ` Masami Hiramatsu 0 siblings, 0 replies; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-21 7:37 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Wed, 20 Feb 2019 11:42:17 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Thu, 21 Feb 2019 01:04:53 +0900 > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > What about just adding 'u' to the end of the offset? Say you have a > > > data structure in kernel space that has a field in user space you want > > > to reference? > > > > > > > > > field_val=+8u(+0(%si)) > > > > Ah, that looks good :~) thank you for this idea! > > <bike-shed> > > Hmm, I wonder if we should make it +u8 or u+8? as +8u may be confused > as unsigned? Like 8ULL. I don't know. Kernel developers suck at > naming :-p I like +u8 since it is easier to implement :-p. Thank you, > > </bike-shed> > > -- Steve -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-20 14:49 ` Steven Rostedt 2019-02-20 16:04 ` Masami Hiramatsu @ 2019-02-22 8:27 ` Masami Hiramatsu 2019-02-22 8:35 ` Masami Hiramatsu 1 sibling, 1 reply; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-22 8:27 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski Hi Steve, On Wed, 20 Feb 2019 09:49:26 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 20 Feb 2019 17:10:19 +0900 > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > Let me ensure what you want. So you want to access a "string" in user-space, > > not a data structure? In that case, it is very easy to me. It is enough to > > add a "ustring" type to kprobe events. For example, do_sys_opsn's path > > variable is one example. That will be +0(+0(%si)):ustring, and fetcher > > finally copy the string using strncpy_from_user() instead of > > strncpy_from_unsafe(). (*) > > ustring would be good. I've tried to implement ustring and u-offsets, but I got some issues. - access_ok() warns if it is called in IRQ context (kprobes is.) - copy_from_user uses access_ok(), so it is not designed for irq handler. Moreover, if we have different kernel/user address spaces, we have to assign target user-pages to kernel vma. Can we do that (doesn't it involve mutex locks)? If not, I think what we can do "in kprobes" is only probe_kernel_read() and strncpy_from_unsafe(). This means, on the architechture whoes kernel address space doesn't map user space, we can not support user-space data fetching in kprobe evevnts. Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 8:27 ` Masami Hiramatsu @ 2019-02-22 8:35 ` Masami Hiramatsu 2019-02-22 17:43 ` Linus Torvalds 0 siblings, 1 reply; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-22 8:35 UTC (permalink / raw) To: Masami Hiramatsu Cc: Steven Rostedt, Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Fri, 22 Feb 2019 17:27:45 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > Hi Steve, > > On Wed, 20 Feb 2019 09:49:26 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Wed, 20 Feb 2019 17:10:19 +0900 > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > Let me ensure what you want. So you want to access a "string" in user-space, > > > not a data structure? In that case, it is very easy to me. It is enough to > > > add a "ustring" type to kprobe events. For example, do_sys_opsn's path > > > variable is one example. That will be +0(+0(%si)):ustring, and fetcher > > > finally copy the string using strncpy_from_user() instead of > > > strncpy_from_unsafe(). (*) > > > > ustring would be good. > > I've tried to implement ustring and u-offsets, but I got some issues. > > - access_ok() warns if it is called in IRQ context (kprobes is.) > - copy_from_user uses access_ok(), so it is not designed for irq handler. > > Moreover, if we have different kernel/user address spaces, we have to > assign target user-pages to kernel vma. Can we do that (doesn't it involve > mutex locks)? Or, can we do this? long __probe_user_read(void *dst, const void *src, size_t size) { long ret; mm_segment_t old_fs = get_fs(); set_fs(USER_DS); /* Only this is changed */ pagefault_disable(); current->kernel_uaccess_faults_ok++; ret = __copy_from_user_inatomic(dst, (__force const void __user *)src, size); current->kernel_uaccess_faults_ok--; pagefault_enable(); set_fs(old_fs); return ret ? -EFAULT : 0; } -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 8:35 ` Masami Hiramatsu @ 2019-02-22 17:43 ` Linus Torvalds 2019-02-22 17:48 ` Andy Lutomirski ` (2 more replies) 0 siblings, 3 replies; 85+ messages in thread From: Linus Torvalds @ 2019-02-22 17:43 UTC (permalink / raw) To: Masami Hiramatsu Cc: Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Fri, Feb 22, 2019 at 12:35 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > Or, can we do this? > > long __probe_user_read(void *dst, const void *src, size_t size) > { Add a if (!access_ok(src, size)) ret = -EFAULT; else { .. do the pagefault_disable() etc .. } to after the "set_fs()", and it looks good to me. Make it clear that yes, this works _only_ for user reads. Adn that makes all the games with "kernel_uaccess_faults_ok" pointless, so you can just remove them. (note that the "access_ok()" has to come after we've done "set_fs()", because it takes the address limit from that). Also, since normally we'd expect that we already have USER_DS, it might be worthwhile to do this with a wrapper, something along the lines of mm_segment_t old_fs = get_fs(); if (segment_eq(old_fs, USER_DS)) return __normal_probe_user_read(); set_fs(USER_DS); ret = __normal_probe_user_read(); set_fs(old_fs); return ret; and have that __normal_probe_user_read() just do if (!access_ok(src, size)) return -EFAULT; pagefault_disable(); ret = __copy_from_user_inatomic(dst, ...); pagefault_enable(); return ret ? -EFAULT : 0; which looks more obvious. Also, I would suggest that you just make the argument type be "const void __user *", since the whole point is that this takes a user pointer, and nothing else. Then we should still probably fix up "__probe_kernel_read()" to not allow user accesses. The easiest way to do that is actually likely to use the "unsafe_get_user()" functions *without* doing a uaccess_begin(), which will mean that modern CPU's will simply fault on a kernel access to user space. The nice thing about that is that usually developers will have access to exactly those modern boxes, so the people who notice that it doesn't work are the right people. Alternatively, we should just make it be architecture-specific, so that architectures can decide "this address cannot be a kernel address" and refuse to do it. Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 17:43 ` Linus Torvalds @ 2019-02-22 17:48 ` Andy Lutomirski 2019-02-22 18:28 ` Linus Torvalds 2019-02-22 19:27 ` Alexei Starovoitov 2019-02-23 3:47 ` Masami Hiramatsu 2 siblings, 1 reply; 85+ messages in thread From: Andy Lutomirski @ 2019-02-22 17:48 UTC (permalink / raw) To: Linus Torvalds Cc: Masami Hiramatsu, Steven Rostedt, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski > On Feb 22, 2019, at 9:43 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> On Fri, Feb 22, 2019 at 12:35 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: >> >> Or, can we do this? >> >> long __probe_user_read(void *dst, const void *src, size_t size) >> { > > Add a > > if (!access_ok(src, size)) > ret = -EFAULT; > else { > .. do the pagefault_disable() etc .. > } > > to after the "set_fs()", and it looks good to me. Make it clear that > yes, this works _only_ for user reads. > > Adn that makes all the games with "kernel_uaccess_faults_ok" > pointless, so you can just remove them. > > (note that the "access_ok()" has to come after we've done "set_fs()", > because it takes the address limit from that). > > Also, since normally we'd expect that we already have USER_DS, it > might be worthwhile to do this with a wrapper, something along the > lines of > > mm_segment_t old_fs = get_fs(); > > if (segment_eq(old_fs, USER_DS)) > return __normal_probe_user_read(); > set_fs(USER_DS); > ret = __normal_probe_user_read(); > set_fs(old_fs); > return ret; > > and have that __normal_probe_user_read() just do > > if (!access_ok(src, size)) > return -EFAULT; > pagefault_disable(); > ret = __copy_from_user_inatomic(dst, ...); > pagefault_enable(); > return ret ? -EFAULT : 0; > > which looks more obvious. > > Also, I would suggest that you just make the argument type be "const > void __user *", since the whole point is that this takes a user > pointer, and nothing else. > > Then we should still probably fix up "__probe_kernel_read()" to not > allow user accesses. The easiest way to do that is actually likely to > use the "unsafe_get_user()" functions *without* doing a > uaccess_begin(), which will mean that modern CPU's will simply fault > on a kernel access to user space. > > The nice thing about that is that usually developers will have access > to exactly those modern boxes, so the people who notice that it > doesn't work are the right people. We use probe_kernel_read() from oops code. I’d rather it return -EFAULT than oops harder and kill the first oops. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 17:48 ` Andy Lutomirski @ 2019-02-22 18:28 ` Linus Torvalds 2019-02-22 19:52 ` Andy Lutomirski 0 siblings, 1 reply; 85+ messages in thread From: Linus Torvalds @ 2019-02-22 18:28 UTC (permalink / raw) To: Andy Lutomirski Cc: Masami Hiramatsu, Steven Rostedt, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Fri, Feb 22, 2019 at 9:48 AM Andy Lutomirski <luto@amacapital.net> wrote: > > > On Feb 22, 2019, at 9:43 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > Then we should still probably fix up "__probe_kernel_read()" to not > > allow user accesses. The easiest way to do that is actually likely to > > use the "unsafe_get_user()" functions *without* doing a > > uaccess_begin(), which will mean that modern CPU's will simply fault > > on a kernel access to user space. > > > > The nice thing about that is that usually developers will have access > > to exactly those modern boxes, so the people who notice that it > > doesn't work are the right people. > > We use probe_kernel_read() from oops code. I’d rather it return -EFAULT than oops harder and kill the first oops. It would still do that. Using the unsafe_get_user() macros doesn't remove the exception handling, and we wouldn't remove the whole "pagefault_disable()" either. So it would work exactly the same way it does now, except on a modern CPU it would return -EFAULT for a user space access due to AC not being set. So no "oops harder", only "safer accesses". Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 18:28 ` Linus Torvalds @ 2019-02-22 19:52 ` Andy Lutomirski 0 siblings, 0 replies; 85+ messages in thread From: Andy Lutomirski @ 2019-02-22 19:52 UTC (permalink / raw) To: Linus Torvalds Cc: Masami Hiramatsu, Steven Rostedt, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski > On Feb 22, 2019, at 10:28 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> On Fri, Feb 22, 2019 at 9:48 AM Andy Lutomirski <luto@amacapital.net> wrote: >> >>> On Feb 22, 2019, at 9:43 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: >>> >>> Then we should still probably fix up "__probe_kernel_read()" to not >>> allow user accesses. The easiest way to do that is actually likely to >>> use the "unsafe_get_user()" functions *without* doing a >>> uaccess_begin(), which will mean that modern CPU's will simply fault >>> on a kernel access to user space. >>> >>> The nice thing about that is that usually developers will have access >>> to exactly those modern boxes, so the people who notice that it >>> doesn't work are the right people. >> >> We use probe_kernel_read() from oops code. I’d rather it return -EFAULT than oops harder and kill the first oops. > > It would still do that. > > Using the unsafe_get_user() macros doesn't remove the exception > handling, and we wouldn't remove the whole "pagefault_disable()" > either. So it would work exactly the same way it does now, except on a > modern CPU it would return -EFAULT for a user space access due to AC > not being set. > > Hmm. I misunderstood you. I thought you wanted the oops. We’d have to check that we don’t trip the “SMAP violation, egads!” check. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 17:43 ` Linus Torvalds 2019-02-22 17:48 ` Andy Lutomirski @ 2019-02-22 19:27 ` Alexei Starovoitov 2019-02-22 19:30 ` Steven Rostedt ` (2 more replies) 2019-02-23 3:47 ` Masami Hiramatsu 2 siblings, 3 replies; 85+ messages in thread From: Alexei Starovoitov @ 2019-02-22 19:27 UTC (permalink / raw) To: Linus Torvalds Cc: Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, daniel, netdev, bpf On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > > Then we should still probably fix up "__probe_kernel_read()" to not > allow user accesses. The easiest way to do that is actually likely to > use the "unsafe_get_user()" functions *without* doing a > uaccess_begin(), which will mean that modern CPU's will simply fault > on a kernel access to user space. On bpf side the bpf_probe_read() helper just calls probe_kernel_read() and users pass both user and kernel addresses into it and expect that the helper will actually try to read from that address. If __probe_kernel_read will suddenly start failing on all user addresses it will break the expectations. How do we solve it in bpf_probe_read? Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte in the loop? That's doable, but people already complain that bpf_probe_read() is slow and shows up in their perf report. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 19:27 ` Alexei Starovoitov @ 2019-02-22 19:30 ` Steven Rostedt 2019-02-22 19:34 ` Alexei Starovoitov 2019-02-22 21:20 ` Linus Torvalds 2019-02-26 15:24 ` Joel Fernandes 2 siblings, 1 reply; 85+ messages in thread From: Steven Rostedt @ 2019-02-22 19:30 UTC (permalink / raw) To: Alexei Starovoitov Cc: Linus Torvalds, Masami Hiramatsu, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, daniel, netdev, bpf On Fri, 22 Feb 2019 11:27:05 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > > > > Then we should still probably fix up "__probe_kernel_read()" to not > > allow user accesses. The easiest way to do that is actually likely to > > use the "unsafe_get_user()" functions *without* doing a > > uaccess_begin(), which will mean that modern CPU's will simply fault > > on a kernel access to user space. > > On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > and users pass both user and kernel addresses into it and expect > that the helper will actually try to read from that address. > > If __probe_kernel_read will suddenly start failing on all user addresses > it will break the expectations. > How do we solve it in bpf_probe_read? > Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte > in the loop? > That's doable, but people already complain that bpf_probe_read() is slow > and shows up in their perf report. We're changing kprobes to add a specific flag to say that we want to differentiate between kernel or user reads. Can this be done with bpf_probe_read()? If it's showing up in perf report, I doubt a single check is going to cause an issue. In fact, it may actually help speed things up as the read will be optimized for either user or kernel address reading. -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 19:30 ` Steven Rostedt @ 2019-02-22 19:34 ` Alexei Starovoitov 2019-02-22 19:39 ` Steven Rostedt 2019-02-22 19:55 ` Andy Lutomirski 0 siblings, 2 replies; 85+ messages in thread From: Alexei Starovoitov @ 2019-02-22 19:34 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Masami Hiramatsu, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, daniel, netdev, bpf On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: > On Fri, 22 Feb 2019 11:27:05 -0800 > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > > > > > > Then we should still probably fix up "__probe_kernel_read()" to not > > > allow user accesses. The easiest way to do that is actually likely to > > > use the "unsafe_get_user()" functions *without* doing a > > > uaccess_begin(), which will mean that modern CPU's will simply fault > > > on a kernel access to user space. > > > > On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > > and users pass both user and kernel addresses into it and expect > > that the helper will actually try to read from that address. > > > > If __probe_kernel_read will suddenly start failing on all user addresses > > it will break the expectations. > > How do we solve it in bpf_probe_read? > > Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte > > in the loop? > > That's doable, but people already complain that bpf_probe_read() is slow > > and shows up in their perf report. > > We're changing kprobes to add a specific flag to say that we want to > differentiate between kernel or user reads. Can this be done with > bpf_probe_read()? If it's showing up in perf report, I doubt a single so you're saying you will break existing kprobe scripts? I don't think it's a good idea. It's not acceptable to break bpf_probe_read uapi. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 19:34 ` Alexei Starovoitov @ 2019-02-22 19:39 ` Steven Rostedt 2019-02-22 19:55 ` Andy Lutomirski 1 sibling, 0 replies; 85+ messages in thread From: Steven Rostedt @ 2019-02-22 19:39 UTC (permalink / raw) To: Alexei Starovoitov Cc: Linus Torvalds, Masami Hiramatsu, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, daniel, netdev, bpf On Fri, 22 Feb 2019 11:34:58 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > so you're saying you will break existing kprobe scripts? Yes we may. > I don't think it's a good idea. > It's not acceptable to break bpf_probe_read uapi. Then you may need to add more code to determine if the address is user space or not in the kernel, and then go the appropriate route, before calling probe_kernel_read(). -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 19:34 ` Alexei Starovoitov 2019-02-22 19:39 ` Steven Rostedt @ 2019-02-22 19:55 ` Andy Lutomirski 2019-02-22 21:43 ` Jann Horn 1 sibling, 1 reply; 85+ messages in thread From: Andy Lutomirski @ 2019-02-22 19:55 UTC (permalink / raw) To: Alexei Starovoitov Cc: Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, daniel, netdev, bpf > On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > >> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: >> On Fri, 22 Feb 2019 11:27:05 -0800 >> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >> >>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: >>>> >>>> Then we should still probably fix up "__probe_kernel_read()" to not >>>> allow user accesses. The easiest way to do that is actually likely to >>>> use the "unsafe_get_user()" functions *without* doing a >>>> uaccess_begin(), which will mean that modern CPU's will simply fault >>>> on a kernel access to user space. >>> >>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() >>> and users pass both user and kernel addresses into it and expect >>> that the helper will actually try to read from that address. >>> >>> If __probe_kernel_read will suddenly start failing on all user addresses >>> it will break the expectations. >>> How do we solve it in bpf_probe_read? >>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte >>> in the loop? >>> That's doable, but people already complain that bpf_probe_read() is slow >>> and shows up in their perf report. >> >> We're changing kprobes to add a specific flag to say that we want to >> differentiate between kernel or user reads. Can this be done with >> bpf_probe_read()? If it's showing up in perf report, I doubt a single > > so you're saying you will break existing kprobe scripts? > I don't think it's a good idea. > It's not acceptable to break bpf_probe_read uapi. > If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. What to do about existing scripts is a different question. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 19:55 ` Andy Lutomirski @ 2019-02-22 21:43 ` Jann Horn 2019-02-22 22:08 ` Nadav Amit 0 siblings, 1 reply; 85+ messages in thread From: Jann Horn @ 2019-02-22 21:43 UTC (permalink / raw) To: Andy Lutomirski Cc: Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Andy Lutomirski, Daniel Borkmann, Network Development, bpf, Nadav Amit, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel) (adding some people from the text_poke series to the thread, removing stable@) On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: > > On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > >> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: > >> On Fri, 22 Feb 2019 11:27:05 -0800 > >> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > >> > >>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > >>>> > >>>> Then we should still probably fix up "__probe_kernel_read()" to not > >>>> allow user accesses. The easiest way to do that is actually likely to > >>>> use the "unsafe_get_user()" functions *without* doing a > >>>> uaccess_begin(), which will mean that modern CPU's will simply fault > >>>> on a kernel access to user space. > >>> > >>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > >>> and users pass both user and kernel addresses into it and expect > >>> that the helper will actually try to read from that address. > >>> > >>> If __probe_kernel_read will suddenly start failing on all user addresses > >>> it will break the expectations. > >>> How do we solve it in bpf_probe_read? > >>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte > >>> in the loop? > >>> That's doable, but people already complain that bpf_probe_read() is slow > >>> and shows up in their perf report. > >> > >> We're changing kprobes to add a specific flag to say that we want to > >> differentiate between kernel or user reads. Can this be done with > >> bpf_probe_read()? If it's showing up in perf report, I doubt a single > > > > so you're saying you will break existing kprobe scripts? > > I don't think it's a good idea. > > It's not acceptable to break bpf_probe_read uapi. > > > > If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. > > What to do about existing scripts is a different question. This lack of logical separation between user and kernel addresses might interact interestingly with the text_poke series, specifically "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for patching" (https://lore.kernel.org/lkml/20190221234451.17632-6-rick.p.edgecombe@intel.com/) and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text poking" (https://lore.kernel.org/lkml/20190221234451.17632-7-rick.p.edgecombe@intel.com/), right? If someone manages to get a tracing BPF program to trigger in a task that has switched to the patching mm, could they use bpf_probe_write_user() - which uses probe_kernel_write() after checking that KERNEL_DS isn't active and that access_ok() passes - to overwrite kernel text that is mapped writable in the patching mm? ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 21:43 ` Jann Horn @ 2019-02-22 22:08 ` Nadav Amit 2019-02-22 22:17 ` Jann Horn 0 siblings, 1 reply; 85+ messages in thread From: Nadav Amit @ 2019-02-22 22:08 UTC (permalink / raw) To: Jann Horn Cc: Andy Lutomirski, Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Andy Lutomirski, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel) > On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote: > > (adding some people from the text_poke series to the thread, removing stable@) > > On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: >>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: >>>> On Fri, 22 Feb 2019 11:27:05 -0800 >>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >>>> >>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: >>>>>> >>>>>> Then we should still probably fix up "__probe_kernel_read()" to not >>>>>> allow user accesses. The easiest way to do that is actually likely to >>>>>> use the "unsafe_get_user()" functions *without* doing a >>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault >>>>>> on a kernel access to user space. >>>>> >>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() >>>>> and users pass both user and kernel addresses into it and expect >>>>> that the helper will actually try to read from that address. >>>>> >>>>> If __probe_kernel_read will suddenly start failing on all user addresses >>>>> it will break the expectations. >>>>> How do we solve it in bpf_probe_read? >>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte >>>>> in the loop? >>>>> That's doable, but people already complain that bpf_probe_read() is slow >>>>> and shows up in their perf report. >>>> >>>> We're changing kprobes to add a specific flag to say that we want to >>>> differentiate between kernel or user reads. Can this be done with >>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single >>> >>> so you're saying you will break existing kprobe scripts? >>> I don't think it's a good idea. >>> It's not acceptable to break bpf_probe_read uapi. >> >> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. >> >> What to do about existing scripts is a different question. > > This lack of logical separation between user and kernel addresses > might interact interestingly with the text_poke series, specifically > "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for > patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cd44d6f0765dd49b20db708d6990ee7e8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864686717142892&sdata=gVALdkEULEhj4iJNEWAGxyYWe2lxnHRdamW5ZA2A5RQ%3D&reserved=0) > and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text > poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cd44d6f0765dd49b20db708d6990ee7e8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864686717142892&sdata=nu2J1FtJsZJmt53SKJz8C8ktWE9eycwdAA%2BiCi1TfCc%3D&reserved=0), > right? If someone manages to get a tracing BPF program to trigger in a > task that has switched to the patching mm, could they use > bpf_probe_write_user() - which uses probe_kernel_write() after > checking that KERNEL_DS isn't active and that access_ok() passes - to > overwrite kernel text that is mapped writable in the patching mm? Yes, this is a good point. I guess text_poke() should be defined with “__kprobes” and open-code memcpy. Does it sound reasonable? ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 22:08 ` Nadav Amit @ 2019-02-22 22:17 ` Jann Horn 2019-02-22 22:21 ` Nadav Amit 0 siblings, 1 reply; 85+ messages in thread From: Jann Horn @ 2019-02-22 22:17 UTC (permalink / raw) To: Nadav Amit Cc: Andy Lutomirski, Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Andy Lutomirski, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel) On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote: > > On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote: > > > > (adding some people from the text_poke series to the thread, removing stable@) > > > > On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: > >>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > >>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: > >>>> On Fri, 22 Feb 2019 11:27:05 -0800 > >>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > >>>> > >>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > >>>>>> > >>>>>> Then we should still probably fix up "__probe_kernel_read()" to not > >>>>>> allow user accesses. The easiest way to do that is actually likely to > >>>>>> use the "unsafe_get_user()" functions *without* doing a > >>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault > >>>>>> on a kernel access to user space. > >>>>> > >>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > >>>>> and users pass both user and kernel addresses into it and expect > >>>>> that the helper will actually try to read from that address. > >>>>> > >>>>> If __probe_kernel_read will suddenly start failing on all user addresses > >>>>> it will break the expectations. > >>>>> How do we solve it in bpf_probe_read? > >>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte > >>>>> in the loop? > >>>>> That's doable, but people already complain that bpf_probe_read() is slow > >>>>> and shows up in their perf report. > >>>> > >>>> We're changing kprobes to add a specific flag to say that we want to > >>>> differentiate between kernel or user reads. Can this be done with > >>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single > >>> > >>> so you're saying you will break existing kprobe scripts? > >>> I don't think it's a good idea. > >>> It's not acceptable to break bpf_probe_read uapi. > >> > >> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. > >> > >> What to do about existing scripts is a different question. > > > > This lack of logical separation between user and kernel addresses > > might interact interestingly with the text_poke series, specifically > > "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for > > patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cd44d6f0765dd49b20db708d6990ee7e8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864686717142892&sdata=gVALdkEULEhj4iJNEWAGxyYWe2lxnHRdamW5ZA2A5RQ%3D&reserved=0) > > and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text > > poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cd44d6f0765dd49b20db708d6990ee7e8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864686717142892&sdata=nu2J1FtJsZJmt53SKJz8C8ktWE9eycwdAA%2BiCi1TfCc%3D&reserved=0), > > right? If someone manages to get a tracing BPF program to trigger in a > > task that has switched to the patching mm, could they use > > bpf_probe_write_user() - which uses probe_kernel_write() after > > checking that KERNEL_DS isn't active and that access_ok() passes - to > > overwrite kernel text that is mapped writable in the patching mm? > > Yes, this is a good point. I guess text_poke() should be defined with > “__kprobes” and open-code memcpy. > > Does it sound reasonable? Doesn't __text_poke() as implemented in the proposed patch use a couple other kernel functions, too? Like switch_mm_irqs_off() and pte_clear() (which can be a call into a separate function on paravirt kernels)? ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 22:17 ` Jann Horn @ 2019-02-22 22:21 ` Nadav Amit 2019-02-22 22:39 ` Nadav Amit 0 siblings, 1 reply; 85+ messages in thread From: Nadav Amit @ 2019-02-22 22:21 UTC (permalink / raw) To: Jann Horn Cc: Andy Lutomirski, Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Andy Lutomirski, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel) > On Feb 22, 2019, at 2:17 PM, Jann Horn <jannh@google.com> wrote: > > On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote: >>> On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote: >>> >>> (adding some people from the text_poke series to the thread, removing stable@) >>> >>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: >>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: >>>>>> On Fri, 22 Feb 2019 11:27:05 -0800 >>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >>>>>> >>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: >>>>>>>> >>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not >>>>>>>> allow user accesses. The easiest way to do that is actually likely to >>>>>>>> use the "unsafe_get_user()" functions *without* doing a >>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault >>>>>>>> on a kernel access to user space. >>>>>>> >>>>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() >>>>>>> and users pass both user and kernel addresses into it and expect >>>>>>> that the helper will actually try to read from that address. >>>>>>> >>>>>>> If __probe_kernel_read will suddenly start failing on all user addresses >>>>>>> it will break the expectations. >>>>>>> How do we solve it in bpf_probe_read? >>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte >>>>>>> in the loop? >>>>>>> That's doable, but people already complain that bpf_probe_read() is slow >>>>>>> and shows up in their perf report. >>>>>> >>>>>> We're changing kprobes to add a specific flag to say that we want to >>>>>> differentiate between kernel or user reads. Can this be done with >>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single >>>>> >>>>> so you're saying you will break existing kprobe scripts? >>>>> I don't think it's a good idea. >>>>> It's not acceptable to break bpf_probe_read uapi. >>>> >>>> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. >>>> >>>> What to do about existing scripts is a different question. >>> >>> This lack of logical separation between user and kernel addresses >>> might interact interestingly with the text_poke series, specifically >>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for >>> patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&sdata=HAbnDcrBne64JyPuVUMKmM7nQk67F%2BFvjuXEn8TmHeo%3D&reserved=0) >>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text >>> poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&sdata=vNRIMKtFDy%2F3z5FlTwDiJY6VGEV%2FMHgQPTdFSFtCo4s%3D&reserved=0), >>> right? If someone manages to get a tracing BPF program to trigger in a >>> task that has switched to the patching mm, could they use >>> bpf_probe_write_user() - which uses probe_kernel_write() after >>> checking that KERNEL_DS isn't active and that access_ok() passes - to >>> overwrite kernel text that is mapped writable in the patching mm? >> >> Yes, this is a good point. I guess text_poke() should be defined with >> “__kprobes” and open-code memcpy. >> >> Does it sound reasonable? > > Doesn't __text_poke() as implemented in the proposed patch use a > couple other kernel functions, too? Like switch_mm_irqs_off() and > pte_clear() (which can be a call into a separate function on paravirt > kernels)? I will move the pte_clear() to be done after the poking mm was unloaded. Give me a few minutes to send a sketch of what I think should be done. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 22:21 ` Nadav Amit @ 2019-02-22 22:39 ` Nadav Amit 2019-02-22 23:02 ` Jann Horn 0 siblings, 1 reply; 85+ messages in thread From: Nadav Amit @ 2019-02-22 22:39 UTC (permalink / raw) To: Jann Horn Cc: Andy Lutomirski, Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Andy Lutomirski, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel), Igor Stoppa > On Feb 22, 2019, at 2:21 PM, Nadav Amit <namit@vmware.com> wrote: > >> On Feb 22, 2019, at 2:17 PM, Jann Horn <jannh@google.com> wrote: >> >> On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote: >>>> On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote: >>>> >>>> (adding some people from the text_poke series to the thread, removing stable@) >>>> >>>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: >>>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >>>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: >>>>>>> On Fri, 22 Feb 2019 11:27:05 -0800 >>>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >>>>>>> >>>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: >>>>>>>>> >>>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not >>>>>>>>> allow user accesses. The easiest way to do that is actually likely to >>>>>>>>> use the "unsafe_get_user()" functions *without* doing a >>>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault >>>>>>>>> on a kernel access to user space. >>>>>>>> >>>>>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() >>>>>>>> and users pass both user and kernel addresses into it and expect >>>>>>>> that the helper will actually try to read from that address. >>>>>>>> >>>>>>>> If __probe_kernel_read will suddenly start failing on all user addresses >>>>>>>> it will break the expectations. >>>>>>>> How do we solve it in bpf_probe_read? >>>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte >>>>>>>> in the loop? >>>>>>>> That's doable, but people already complain that bpf_probe_read() is slow >>>>>>>> and shows up in their perf report. >>>>>>> >>>>>>> We're changing kprobes to add a specific flag to say that we want to >>>>>>> differentiate between kernel or user reads. Can this be done with >>>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single >>>>>> >>>>>> so you're saying you will break existing kprobe scripts? >>>>>> I don't think it's a good idea. >>>>>> It's not acceptable to break bpf_probe_read uapi. >>>>> >>>>> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. >>>>> >>>>> What to do about existing scripts is a different question. >>>> >>>> This lack of logical separation between user and kernel addresses >>>> might interact interestingly with the text_poke series, specifically >>>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for >>>> patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&sdata=HAbnDcrBne64JyPuVUMKmM7nQk67F%2BFvjuXEn8TmHeo%3D&reserved=0) >>>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text >>>> poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&sdata=vNRIMKtFDy%2F3z5FlTwDiJY6VGEV%2FMHgQPTdFSFtCo4s%3D&reserved=0), >>>> right? If someone manages to get a tracing BPF program to trigger in a >>>> task that has switched to the patching mm, could they use >>>> bpf_probe_write_user() - which uses probe_kernel_write() after >>>> checking that KERNEL_DS isn't active and that access_ok() passes - to >>>> overwrite kernel text that is mapped writable in the patching mm? >>> >>> Yes, this is a good point. I guess text_poke() should be defined with >>> “__kprobes” and open-code memcpy. >>> >>> Does it sound reasonable? >> >> Doesn't __text_poke() as implemented in the proposed patch use a >> couple other kernel functions, too? Like switch_mm_irqs_off() and >> pte_clear() (which can be a call into a separate function on paravirt >> kernels)? > > I will move the pte_clear() to be done after the poking mm was unloaded. > Give me a few minutes to send a sketch of what I think should be done. Err.. You are right, I don’t see an easy way of preventing a kprobe from being set on switch_mm_irqs_off(), and open-coding this monster is too ugly. The reasonable solution seems to me as taking all the relevant pieces of code (and data) that might be used during text-poking and encapsulating them, so they will be set in a memory area which cannot be kprobe'd. This can also be useful to write-protect data structures of code that calls text_poke(), e.g., static-keys. It can also protect data on that stack that is used during text_poke() from being overwritten from another core. This solution is somewhat similar to Igor Stoppa’s idea of using “enclaves” when doing write-rarely operations. Right now, I think that text_poke() will keep being susceptible to such an attack, unless you have a better suggestion. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 22:39 ` Nadav Amit @ 2019-02-22 23:02 ` Jann Horn 2019-02-22 23:22 ` Nadav Amit 2019-02-22 23:59 ` Andy Lutomirski 0 siblings, 2 replies; 85+ messages in thread From: Jann Horn @ 2019-02-22 23:02 UTC (permalink / raw) To: Nadav Amit Cc: Andy Lutomirski, Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Andy Lutomirski, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel), Igor Stoppa On Fri, Feb 22, 2019 at 11:39 PM Nadav Amit <namit@vmware.com> wrote: > > On Feb 22, 2019, at 2:21 PM, Nadav Amit <namit@vmware.com> wrote: > > > >> On Feb 22, 2019, at 2:17 PM, Jann Horn <jannh@google.com> wrote: > >> > >> On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote: > >>>> On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote: > >>>> > >>>> (adding some people from the text_poke series to the thread, removing stable@) > >>>> > >>>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: > >>>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > >>>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: > >>>>>>> On Fri, 22 Feb 2019 11:27:05 -0800 > >>>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > >>>>>>> > >>>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > >>>>>>>>> > >>>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not > >>>>>>>>> allow user accesses. The easiest way to do that is actually likely to > >>>>>>>>> use the "unsafe_get_user()" functions *without* doing a > >>>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault > >>>>>>>>> on a kernel access to user space. > >>>>>>>> > >>>>>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > >>>>>>>> and users pass both user and kernel addresses into it and expect > >>>>>>>> that the helper will actually try to read from that address. > >>>>>>>> > >>>>>>>> If __probe_kernel_read will suddenly start failing on all user addresses > >>>>>>>> it will break the expectations. > >>>>>>>> How do we solve it in bpf_probe_read? > >>>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte > >>>>>>>> in the loop? > >>>>>>>> That's doable, but people already complain that bpf_probe_read() is slow > >>>>>>>> and shows up in their perf report. > >>>>>>> > >>>>>>> We're changing kprobes to add a specific flag to say that we want to > >>>>>>> differentiate between kernel or user reads. Can this be done with > >>>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single > >>>>>> > >>>>>> so you're saying you will break existing kprobe scripts? > >>>>>> I don't think it's a good idea. > >>>>>> It's not acceptable to break bpf_probe_read uapi. > >>>>> > >>>>> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. > >>>>> > >>>>> What to do about existing scripts is a different question. > >>>> > >>>> This lack of logical separation between user and kernel addresses > >>>> might interact interestingly with the text_poke series, specifically > >>>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for > >>>> patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&sdata=HAbnDcrBne64JyPuVUMKmM7nQk67F%2BFvjuXEn8TmHeo%3D&reserved=0) > >>>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text > >>>> poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&sdata=vNRIMKtFDy%2F3z5FlTwDiJY6VGEV%2FMHgQPTdFSFtCo4s%3D&reserved=0), > >>>> right? If someone manages to get a tracing BPF program to trigger in a > >>>> task that has switched to the patching mm, could they use > >>>> bpf_probe_write_user() - which uses probe_kernel_write() after > >>>> checking that KERNEL_DS isn't active and that access_ok() passes - to > >>>> overwrite kernel text that is mapped writable in the patching mm? > >>> > >>> Yes, this is a good point. I guess text_poke() should be defined with > >>> “__kprobes” and open-code memcpy. > >>> > >>> Does it sound reasonable? > >> > >> Doesn't __text_poke() as implemented in the proposed patch use a > >> couple other kernel functions, too? Like switch_mm_irqs_off() and > >> pte_clear() (which can be a call into a separate function on paravirt > >> kernels)? > > > > I will move the pte_clear() to be done after the poking mm was unloaded. > > Give me a few minutes to send a sketch of what I think should be done. > > Err.. You are right, I don’t see an easy way of preventing a kprobe from > being set on switch_mm_irqs_off(), and open-coding this monster is too ugly. > > The reasonable solution seems to me as taking all the relevant pieces of > code (and data) that might be used during text-poking and encapsulating them, so they > will be set in a memory area which cannot be kprobe'd. This can also be > useful to write-protect data structures of code that calls text_poke(), > e.g., static-keys. It can also protect data on that stack that is used > during text_poke() from being overwritten from another core. > > This solution is somewhat similar to Igor Stoppa’s idea of using “enclaves” > when doing write-rarely operations. > > Right now, I think that text_poke() will keep being susceptible to such > an attack, unless you have a better suggestion. A relatively simple approach might be to teach BPF not to run kprobe programs and such in contexts where current->mm isn't the active mm? Maybe using nmi_uaccess_okay(), or something like that? It looks like perf_callchain_user() also already uses that. Except that a lot of this code is x86-specific... ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 23:02 ` Jann Horn @ 2019-02-22 23:22 ` Nadav Amit 2019-02-22 23:59 ` Andy Lutomirski 1 sibling, 0 replies; 85+ messages in thread From: Nadav Amit @ 2019-02-22 23:22 UTC (permalink / raw) To: Jann Horn Cc: Andy Lutomirski, Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Andy Lutomirski, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel), Igor Stoppa > On Feb 22, 2019, at 3:02 PM, Jann Horn <jannh@google.com> wrote: > > On Fri, Feb 22, 2019 at 11:39 PM Nadav Amit <namit@vmware.com> wrote: >>> On Feb 22, 2019, at 2:21 PM, Nadav Amit <namit@vmware.com> wrote: >>> >>>> On Feb 22, 2019, at 2:17 PM, Jann Horn <jannh@google.com> wrote: >>>> >>>> On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote: >>>>>> On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote: >>>>>> >>>>>> (adding some people from the text_poke series to the thread, removing stable@) >>>>>> >>>>>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: >>>>>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >>>>>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: >>>>>>>>> On Fri, 22 Feb 2019 11:27:05 -0800 >>>>>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: >>>>>>>>>>> >>>>>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not >>>>>>>>>>> allow user accesses. The easiest way to do that is actually likely to >>>>>>>>>>> use the "unsafe_get_user()" functions *without* doing a >>>>>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault >>>>>>>>>>> on a kernel access to user space. >>>>>>>>>> >>>>>>>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() >>>>>>>>>> and users pass both user and kernel addresses into it and expect >>>>>>>>>> that the helper will actually try to read from that address. >>>>>>>>>> >>>>>>>>>> If __probe_kernel_read will suddenly start failing on all user addresses >>>>>>>>>> it will break the expectations. >>>>>>>>>> How do we solve it in bpf_probe_read? >>>>>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte >>>>>>>>>> in the loop? >>>>>>>>>> That's doable, but people already complain that bpf_probe_read() is slow >>>>>>>>>> and shows up in their perf report. >>>>>>>>> >>>>>>>>> We're changing kprobes to add a specific flag to say that we want to >>>>>>>>> differentiate between kernel or user reads. Can this be done with >>>>>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single >>>>>>>> >>>>>>>> so you're saying you will break existing kprobe scripts? >>>>>>>> I don't think it's a good idea. >>>>>>>> It's not acceptable to break bpf_probe_read uapi. >>>>>>> >>>>>>> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. >>>>>>> >>>>>>> What to do about existing scripts is a different question. >>>>>> >>>>>> This lack of logical separation between user and kernel addresses >>>>>> might interact interestingly with the text_poke series, specifically >>>>>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for >>>>>> patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cd03df2db76624da8eb2008d69919e41a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864733821233906&sdata=ky5iTrsCceoPwVW5N9FB4sDspwGEQ8MTlRE4b1Bqn54%3D&reserved=0) >>>>>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text >>>>>> poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cd03df2db76624da8eb2008d69919e41a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864733821233906&sdata=EJs8doLrfFfMTKiVHmWjmpnpWolmuuZ5pxOmEMcI0ew%3D&reserved=0), >>>>>> right? If someone manages to get a tracing BPF program to trigger in a >>>>>> task that has switched to the patching mm, could they use >>>>>> bpf_probe_write_user() - which uses probe_kernel_write() after >>>>>> checking that KERNEL_DS isn't active and that access_ok() passes - to >>>>>> overwrite kernel text that is mapped writable in the patching mm? >>>>> >>>>> Yes, this is a good point. I guess text_poke() should be defined with >>>>> “__kprobes” and open-code memcpy. >>>>> >>>>> Does it sound reasonable? >>>> >>>> Doesn't __text_poke() as implemented in the proposed patch use a >>>> couple other kernel functions, too? Like switch_mm_irqs_off() and >>>> pte_clear() (which can be a call into a separate function on paravirt >>>> kernels)? >>> >>> I will move the pte_clear() to be done after the poking mm was unloaded. >>> Give me a few minutes to send a sketch of what I think should be done. >> >> Err.. You are right, I don’t see an easy way of preventing a kprobe from >> being set on switch_mm_irqs_off(), and open-coding this monster is too ugly. >> >> The reasonable solution seems to me as taking all the relevant pieces of >> code (and data) that might be used during text-poking and encapsulating them, so they >> will be set in a memory area which cannot be kprobe'd. This can also be >> useful to write-protect data structures of code that calls text_poke(), >> e.g., static-keys. It can also protect data on that stack that is used >> during text_poke() from being overwritten from another core. >> >> This solution is somewhat similar to Igor Stoppa’s idea of using “enclaves” >> when doing write-rarely operations. >> >> Right now, I think that text_poke() will keep being susceptible to such >> an attack, unless you have a better suggestion. > > A relatively simple approach might be to teach BPF not to run kprobe > programs and such in contexts where current->mm isn't the active mm? > Maybe using nmi_uaccess_okay(), or something like that? It looks like > perf_callchain_user() also already uses that. Except that a lot of > this code is x86-specific… I keep having in mind how to reduce the TCB that is used while text_poke() is running, but for the specific issue here, I think your approach would be fine, and trace_call_bpf() can be modified to do what you ask for. But, I am not sure that relying on current->mm gets us any more security, relatively to having another unrelated explicit kprobe-disable indication, which is cleaner from design point-of-view. I can see how we get “some more security” if our decision whether kprobes should be enabled was purely based on some hardware register (e.g., CR3) and we could unequivocally realize whether kprobes eBPF should be on/off without memory accesses (e.g., PCID bit set). Yet, I am not sure it worth it. What do you say? ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 23:02 ` Jann Horn 2019-02-22 23:22 ` Nadav Amit @ 2019-02-22 23:59 ` Andy Lutomirski 2019-02-23 0:03 ` Alexei Starovoitov ` (2 more replies) 1 sibling, 3 replies; 85+ messages in thread From: Andy Lutomirski @ 2019-02-22 23:59 UTC (permalink / raw) To: Jann Horn Cc: Nadav Amit, Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Andy Lutomirski, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel), Igor Stoppa On Fri, Feb 22, 2019 at 3:02 PM Jann Horn <jannh@google.com> wrote: > > On Fri, Feb 22, 2019 at 11:39 PM Nadav Amit <namit@vmware.com> wrote: > > > On Feb 22, 2019, at 2:21 PM, Nadav Amit <namit@vmware.com> wrote: > > > > > >> On Feb 22, 2019, at 2:17 PM, Jann Horn <jannh@google.com> wrote: > > >> > > >> On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote: > > >>>> On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote: > > >>>> > > >>>> (adding some people from the text_poke series to the thread, removing stable@) > > >>>> > > >>>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: > > >>>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > >>>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: > > >>>>>>> On Fri, 22 Feb 2019 11:27:05 -0800 > > >>>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > >>>>>>> > > >>>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > > >>>>>>>>> > > >>>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not > > >>>>>>>>> allow user accesses. The easiest way to do that is actually likely to > > >>>>>>>>> use the "unsafe_get_user()" functions *without* doing a > > >>>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault > > >>>>>>>>> on a kernel access to user space. > > >>>>>>>> > > >>>>>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > > >>>>>>>> and users pass both user and kernel addresses into it and expect > > >>>>>>>> that the helper will actually try to read from that address. > > >>>>>>>> > > >>>>>>>> If __probe_kernel_read will suddenly start failing on all user addresses > > >>>>>>>> it will break the expectations. > > >>>>>>>> How do we solve it in bpf_probe_read? > > >>>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte > > >>>>>>>> in the loop? > > >>>>>>>> That's doable, but people already complain that bpf_probe_read() is slow > > >>>>>>>> and shows up in their perf report. > > >>>>>>> > > >>>>>>> We're changing kprobes to add a specific flag to say that we want to > > >>>>>>> differentiate between kernel or user reads. Can this be done with > > >>>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single > > >>>>>> > > >>>>>> so you're saying you will break existing kprobe scripts? > > >>>>>> I don't think it's a good idea. > > >>>>>> It's not acceptable to break bpf_probe_read uapi. > > >>>>> > > >>>>> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. > > >>>>> > > >>>>> What to do about existing scripts is a different question. > > >>>> > > >>>> This lack of logical separation between user and kernel addresses > > >>>> might interact interestingly with the text_poke series, specifically > > >>>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for > > >>>> patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&sdata=HAbnDcrBne64JyPuVUMKmM7nQk67F%2BFvjuXEn8TmHeo%3D&reserved=0) > > >>>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text > > >>>> poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&sdata=vNRIMKtFDy%2F3z5FlTwDiJY6VGEV%2FMHgQPTdFSFtCo4s%3D&reserved=0), > > >>>> right? If someone manages to get a tracing BPF program to trigger in a > > >>>> task that has switched to the patching mm, could they use > > >>>> bpf_probe_write_user() - which uses probe_kernel_write() after > > >>>> checking that KERNEL_DS isn't active and that access_ok() passes - to > > >>>> overwrite kernel text that is mapped writable in the patching mm? > > >>> > > >>> Yes, this is a good point. I guess text_poke() should be defined with > > >>> “__kprobes” and open-code memcpy. > > >>> > > >>> Does it sound reasonable? > > >> > > >> Doesn't __text_poke() as implemented in the proposed patch use a > > >> couple other kernel functions, too? Like switch_mm_irqs_off() and > > >> pte_clear() (which can be a call into a separate function on paravirt > > >> kernels)? > > > > > > I will move the pte_clear() to be done after the poking mm was unloaded. > > > Give me a few minutes to send a sketch of what I think should be done. > > > > Err.. You are right, I don’t see an easy way of preventing a kprobe from > > being set on switch_mm_irqs_off(), and open-coding this monster is too ugly. > > > > The reasonable solution seems to me as taking all the relevant pieces of > > code (and data) that might be used during text-poking and encapsulating them, so they > > will be set in a memory area which cannot be kprobe'd. This can also be > > useful to write-protect data structures of code that calls text_poke(), > > e.g., static-keys. It can also protect data on that stack that is used > > during text_poke() from being overwritten from another core. > > > > This solution is somewhat similar to Igor Stoppa’s idea of using “enclaves” > > when doing write-rarely operations. > > > > Right now, I think that text_poke() will keep being susceptible to such > > an attack, unless you have a better suggestion. > > A relatively simple approach might be to teach BPF not to run kprobe > programs and such in contexts where current->mm isn't the active mm? > Maybe using nmi_uaccess_okay(), or something like that? It looks like > perf_callchain_user() also already uses that. Except that a lot of > this code is x86-specific... This sounds like exactly the right solution. If you're running from some unknown context (like NMI or tracing), then you should check nmi_uaccess_okay(). I think we should just promote that to be a non-arch-specific function (that returns true by default) and check it the relevant bpf_probe_xyz() functions. Alexei, does that seem reasonable? ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 23:59 ` Andy Lutomirski @ 2019-02-23 0:03 ` Alexei Starovoitov 2019-02-23 0:15 ` Nadav Amit 2019-02-25 13:36 ` Masami Hiramatsu 2 siblings, 0 replies; 85+ messages in thread From: Alexei Starovoitov @ 2019-02-23 0:03 UTC (permalink / raw) To: Andy Lutomirski Cc: Jann Horn, Nadav Amit, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel), Igor Stoppa On Fri, Feb 22, 2019 at 03:59:30PM -0800, Andy Lutomirski wrote: > > > > A relatively simple approach might be to teach BPF not to run kprobe > > programs and such in contexts where current->mm isn't the active mm? > > Maybe using nmi_uaccess_okay(), or something like that? It looks like > > perf_callchain_user() also already uses that. Except that a lot of > > this code is x86-specific... > > This sounds like exactly the right solution. If you're running from > some unknown context (like NMI or tracing), then you should check > nmi_uaccess_okay(). I think we should just promote that to be a > non-arch-specific function (that returns true by default) and check it > the relevant bpf_probe_xyz() functions. > > Alexei, does that seem reasonable? yep. I think that should work. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 23:59 ` Andy Lutomirski 2019-02-23 0:03 ` Alexei Starovoitov @ 2019-02-23 0:15 ` Nadav Amit 2019-02-24 19:35 ` Andy Lutomirski 2019-02-25 13:36 ` Masami Hiramatsu 2 siblings, 1 reply; 85+ messages in thread From: Nadav Amit @ 2019-02-23 0:15 UTC (permalink / raw) To: Andy Lutomirski Cc: Jann Horn, Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel), Igor Stoppa > On Feb 22, 2019, at 3:59 PM, Andy Lutomirski <luto@kernel.org> wrote: > > On Fri, Feb 22, 2019 at 3:02 PM Jann Horn <jannh@google.com> wrote: >> On Fri, Feb 22, 2019 at 11:39 PM Nadav Amit <namit@vmware.com> wrote: >>>> On Feb 22, 2019, at 2:21 PM, Nadav Amit <namit@vmware.com> wrote: >>>> >>>>> On Feb 22, 2019, at 2:17 PM, Jann Horn <jannh@google.com> wrote: >>>>> >>>>> On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote: >>>>>>> On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote: >>>>>>> >>>>>>> (adding some people from the text_poke series to the thread, removing stable@) >>>>>>> >>>>>>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: >>>>>>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >>>>>>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: >>>>>>>>>> On Fri, 22 Feb 2019 11:27:05 -0800 >>>>>>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: >>>>>>>>>>>> >>>>>>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not >>>>>>>>>>>> allow user accesses. The easiest way to do that is actually likely to >>>>>>>>>>>> use the "unsafe_get_user()" functions *without* doing a >>>>>>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault >>>>>>>>>>>> on a kernel access to user space. >>>>>>>>>>> >>>>>>>>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() >>>>>>>>>>> and users pass both user and kernel addresses into it and expect >>>>>>>>>>> that the helper will actually try to read from that address. >>>>>>>>>>> >>>>>>>>>>> If __probe_kernel_read will suddenly start failing on all user addresses >>>>>>>>>>> it will break the expectations. >>>>>>>>>>> How do we solve it in bpf_probe_read? >>>>>>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte >>>>>>>>>>> in the loop? >>>>>>>>>>> That's doable, but people already complain that bpf_probe_read() is slow >>>>>>>>>>> and shows up in their perf report. >>>>>>>>>> >>>>>>>>>> We're changing kprobes to add a specific flag to say that we want to >>>>>>>>>> differentiate between kernel or user reads. Can this be done with >>>>>>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single >>>>>>>>> >>>>>>>>> so you're saying you will break existing kprobe scripts? >>>>>>>>> I don't think it's a good idea. >>>>>>>>> It's not acceptable to break bpf_probe_read uapi. >>>>>>>> >>>>>>>> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. >>>>>>>> >>>>>>>> What to do about existing scripts is a different question. >>>>>>> >>>>>>> This lack of logical separation between user and kernel addresses >>>>>>> might interact interestingly with the text_poke series, specifically >>>>>>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for >>>>>>> patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cbab53e52cc5c4ac4419008d69921d1f1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864767879731941&sdata=2tqD7udTCfNbcNLcj5SFpZt8WwK5NwtgaWMKm1Ye1EE%3D&reserved=0) >>>>>>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text >>>>>>> poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cbab53e52cc5c4ac4419008d69921d1f1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864767879731941&sdata=7%2BLShgLKnra6xzSkxdJrCclCacfdE5IdczwScW83nuE%3D&reserved=0), >>>>>>> right? If someone manages to get a tracing BPF program to trigger in a >>>>>>> task that has switched to the patching mm, could they use >>>>>>> bpf_probe_write_user() - which uses probe_kernel_write() after >>>>>>> checking that KERNEL_DS isn't active and that access_ok() passes - to >>>>>>> overwrite kernel text that is mapped writable in the patching mm? >>>>>> >>>>>> Yes, this is a good point. I guess text_poke() should be defined with >>>>>> “__kprobes” and open-code memcpy. >>>>>> >>>>>> Does it sound reasonable? >>>>> >>>>> Doesn't __text_poke() as implemented in the proposed patch use a >>>>> couple other kernel functions, too? Like switch_mm_irqs_off() and >>>>> pte_clear() (which can be a call into a separate function on paravirt >>>>> kernels)? >>>> >>>> I will move the pte_clear() to be done after the poking mm was unloaded. >>>> Give me a few minutes to send a sketch of what I think should be done. >>> >>> Err.. You are right, I don’t see an easy way of preventing a kprobe from >>> being set on switch_mm_irqs_off(), and open-coding this monster is too ugly. >>> >>> The reasonable solution seems to me as taking all the relevant pieces of >>> code (and data) that might be used during text-poking and encapsulating them, so they >>> will be set in a memory area which cannot be kprobe'd. This can also be >>> useful to write-protect data structures of code that calls text_poke(), >>> e.g., static-keys. It can also protect data on that stack that is used >>> during text_poke() from being overwritten from another core. >>> >>> This solution is somewhat similar to Igor Stoppa’s idea of using “enclaves” >>> when doing write-rarely operations. >>> >>> Right now, I think that text_poke() will keep being susceptible to such >>> an attack, unless you have a better suggestion. >> >> A relatively simple approach might be to teach BPF not to run kprobe >> programs and such in contexts where current->mm isn't the active mm? >> Maybe using nmi_uaccess_okay(), or something like that? It looks like >> perf_callchain_user() also already uses that. Except that a lot of >> this code is x86-specific... > > This sounds like exactly the right solution. If you're running from > some unknown context (like NMI or tracing), then you should check > nmi_uaccess_okay(). I think we should just promote that to be a > non-arch-specific function (that returns true by default) and check it > the relevant bpf_probe_xyz() functions. I can do that, but notice that switch_mm_irqs_off() writes to cpu_tlbstate.loaded_mm before it actually writes to CR3. So there are still a couple of instructions (and the load_new_mm_cr3()) in between that a kprobe can be set on, no? I can mark them as non-kprobable. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-23 0:15 ` Nadav Amit @ 2019-02-24 19:35 ` Andy Lutomirski 0 siblings, 0 replies; 85+ messages in thread From: Andy Lutomirski @ 2019-02-24 19:35 UTC (permalink / raw) To: Nadav Amit Cc: Andy Lutomirski, Jann Horn, Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel), Igor Stoppa On Sat, Feb 23, 2019 at 12:30 AM Nadav Amit <namit@vmware.com> wrote: > > > On Feb 22, 2019, at 3:59 PM, Andy Lutomirski <luto@kernel.org> wrote: > > > > On Fri, Feb 22, 2019 at 3:02 PM Jann Horn <jannh@google.com> wrote: > >> On Fri, Feb 22, 2019 at 11:39 PM Nadav Amit <namit@vmware.com> wrote: > >>>> On Feb 22, 2019, at 2:21 PM, Nadav Amit <namit@vmware.com> wrote: > >>>> > >>>>> On Feb 22, 2019, at 2:17 PM, Jann Horn <jannh@google.com> wrote: > >>>>> > >>>>> On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote: > >>>>>>> On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote: > >>>>>>> > >>>>>>> (adding some people from the text_poke series to the thread, removing stable@) > >>>>>>> > >>>>>>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: > >>>>>>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > >>>>>>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: > >>>>>>>>>> On Fri, 22 Feb 2019 11:27:05 -0800 > >>>>>>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > >>>>>>>>>> > >>>>>>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not > >>>>>>>>>>>> allow user accesses. The easiest way to do that is actually likely to > >>>>>>>>>>>> use the "unsafe_get_user()" functions *without* doing a > >>>>>>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault > >>>>>>>>>>>> on a kernel access to user space. > >>>>>>>>>>> > >>>>>>>>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > >>>>>>>>>>> and users pass both user and kernel addresses into it and expect > >>>>>>>>>>> that the helper will actually try to read from that address. > >>>>>>>>>>> > >>>>>>>>>>> If __probe_kernel_read will suddenly start failing on all user addresses > >>>>>>>>>>> it will break the expectations. > >>>>>>>>>>> How do we solve it in bpf_probe_read? > >>>>>>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte > >>>>>>>>>>> in the loop? > >>>>>>>>>>> That's doable, but people already complain that bpf_probe_read() is slow > >>>>>>>>>>> and shows up in their perf report. > >>>>>>>>>> > >>>>>>>>>> We're changing kprobes to add a specific flag to say that we want to > >>>>>>>>>> differentiate between kernel or user reads. Can this be done with > >>>>>>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single > >>>>>>>>> > >>>>>>>>> so you're saying you will break existing kprobe scripts? > >>>>>>>>> I don't think it's a good idea. > >>>>>>>>> It's not acceptable to break bpf_probe_read uapi. > >>>>>>>> > >>>>>>>> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. > >>>>>>>> > >>>>>>>> What to do about existing scripts is a different question. > >>>>>>> > >>>>>>> This lack of logical separation between user and kernel addresses > >>>>>>> might interact interestingly with the text_poke series, specifically > >>>>>>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for > >>>>>>> patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cbab53e52cc5c4ac4419008d69921d1f1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864767879731941&sdata=2tqD7udTCfNbcNLcj5SFpZt8WwK5NwtgaWMKm1Ye1EE%3D&reserved=0) > >>>>>>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text > >>>>>>> poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cbab53e52cc5c4ac4419008d69921d1f1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864767879731941&sdata=7%2BLShgLKnra6xzSkxdJrCclCacfdE5IdczwScW83nuE%3D&reserved=0), > >>>>>>> right? If someone manages to get a tracing BPF program to trigger in a > >>>>>>> task that has switched to the patching mm, could they use > >>>>>>> bpf_probe_write_user() - which uses probe_kernel_write() after > >>>>>>> checking that KERNEL_DS isn't active and that access_ok() passes - to > >>>>>>> overwrite kernel text that is mapped writable in the patching mm? > >>>>>> > >>>>>> Yes, this is a good point. I guess text_poke() should be defined with > >>>>>> “__kprobes” and open-code memcpy. > >>>>>> > >>>>>> Does it sound reasonable? > >>>>> > >>>>> Doesn't __text_poke() as implemented in the proposed patch use a > >>>>> couple other kernel functions, too? Like switch_mm_irqs_off() and > >>>>> pte_clear() (which can be a call into a separate function on paravirt > >>>>> kernels)? > >>>> > >>>> I will move the pte_clear() to be done after the poking mm was unloaded. > >>>> Give me a few minutes to send a sketch of what I think should be done. > >>> > >>> Err.. You are right, I don’t see an easy way of preventing a kprobe from > >>> being set on switch_mm_irqs_off(), and open-coding this monster is too ugly. > >>> > >>> The reasonable solution seems to me as taking all the relevant pieces of > >>> code (and data) that might be used during text-poking and encapsulating them, so they > >>> will be set in a memory area which cannot be kprobe'd. This can also be > >>> useful to write-protect data structures of code that calls text_poke(), > >>> e.g., static-keys. It can also protect data on that stack that is used > >>> during text_poke() from being overwritten from another core. > >>> > >>> This solution is somewhat similar to Igor Stoppa’s idea of using “enclaves” > >>> when doing write-rarely operations. > >>> > >>> Right now, I think that text_poke() will keep being susceptible to such > >>> an attack, unless you have a better suggestion. > >> > >> A relatively simple approach might be to teach BPF not to run kprobe > >> programs and such in contexts where current->mm isn't the active mm? > >> Maybe using nmi_uaccess_okay(), or something like that? It looks like > >> perf_callchain_user() also already uses that. Except that a lot of > >> this code is x86-specific... > > > > This sounds like exactly the right solution. If you're running from > > some unknown context (like NMI or tracing), then you should check > > nmi_uaccess_okay(). I think we should just promote that to be a > > non-arch-specific function (that returns true by default) and check it > > the relevant bpf_probe_xyz() functions. > > I can do that, but notice that switch_mm_irqs_off() writes to > cpu_tlbstate.loaded_mm before it actually writes to CR3. So there are still > a couple of instructions (and the load_new_mm_cr3()) in between that a > kprobe can be set on, no? But you can't mark then as no-nmi :) See the comment in nmi_uaccess_ok() -- the code is intended to work correctly during this window. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 23:59 ` Andy Lutomirski 2019-02-23 0:03 ` Alexei Starovoitov 2019-02-23 0:15 ` Nadav Amit @ 2019-02-25 13:36 ` Masami Hiramatsu 2 siblings, 0 replies; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-25 13:36 UTC (permalink / raw) To: Andy Lutomirski Cc: Jann Horn, Nadav Amit, Alexei Starovoitov, Steven Rostedt, Linus Torvalds, Masami Hiramatsu, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, Changbin Du, Kees Cook, Daniel Borkmann, Network Development, bpf, Rick Edgecombe, Dave Hansen, Peter Zijlstra (Intel), Igor Stoppa On Fri, 22 Feb 2019 15:59:30 -0800 Andy Lutomirski <luto@kernel.org> wrote: > On Fri, Feb 22, 2019 at 3:02 PM Jann Horn <jannh@google.com> wrote: > > > > On Fri, Feb 22, 2019 at 11:39 PM Nadav Amit <namit@vmware.com> wrote: > > > > On Feb 22, 2019, at 2:21 PM, Nadav Amit <namit@vmware.com> wrote: > > > > > > > >> On Feb 22, 2019, at 2:17 PM, Jann Horn <jannh@google.com> wrote: > > > >> > > > >> On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote: > > > >>>> On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote: > > > >>>> > > > >>>> (adding some people from the text_poke series to the thread, removing stable@) > > > >>>> > > > >>>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote: > > > >>>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > >>>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote: > > > >>>>>>> On Fri, 22 Feb 2019 11:27:05 -0800 > > > >>>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > >>>>>>> > > > >>>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > > > >>>>>>>>> > > > >>>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not > > > >>>>>>>>> allow user accesses. The easiest way to do that is actually likely to > > > >>>>>>>>> use the "unsafe_get_user()" functions *without* doing a > > > >>>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault > > > >>>>>>>>> on a kernel access to user space. > > > >>>>>>>> > > > >>>>>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > > > >>>>>>>> and users pass both user and kernel addresses into it and expect > > > >>>>>>>> that the helper will actually try to read from that address. > > > >>>>>>>> > > > >>>>>>>> If __probe_kernel_read will suddenly start failing on all user addresses > > > >>>>>>>> it will break the expectations. > > > >>>>>>>> How do we solve it in bpf_probe_read? > > > >>>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte > > > >>>>>>>> in the loop? > > > >>>>>>>> That's doable, but people already complain that bpf_probe_read() is slow > > > >>>>>>>> and shows up in their perf report. > > > >>>>>>> > > > >>>>>>> We're changing kprobes to add a specific flag to say that we want to > > > >>>>>>> differentiate between kernel or user reads. Can this be done with > > > >>>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single > > > >>>>>> > > > >>>>>> so you're saying you will break existing kprobe scripts? > > > >>>>>> I don't think it's a good idea. > > > >>>>>> It's not acceptable to break bpf_probe_read uapi. > > > >>>>> > > > >>>>> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x. > > > >>>>> > > > >>>>> What to do about existing scripts is a different question. > > > >>>> > > > >>>> This lack of logical separation between user and kernel addresses > > > >>>> might interact interestingly with the text_poke series, specifically > > > >>>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for > > > >>>> patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&sdata=HAbnDcrBne64JyPuVUMKmM7nQk67F%2BFvjuXEn8TmHeo%3D&reserved=0) > > > >>>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text > > > >>>> poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cf2513009ef734ecd6b0d08d69913a5ae%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864707020821793&sdata=vNRIMKtFDy%2F3z5FlTwDiJY6VGEV%2FMHgQPTdFSFtCo4s%3D&reserved=0), > > > >>>> right? If someone manages to get a tracing BPF program to trigger in a > > > >>>> task that has switched to the patching mm, could they use > > > >>>> bpf_probe_write_user() - which uses probe_kernel_write() after > > > >>>> checking that KERNEL_DS isn't active and that access_ok() passes - to > > > >>>> overwrite kernel text that is mapped writable in the patching mm? > > > >>> > > > >>> Yes, this is a good point. I guess text_poke() should be defined with > > > >>> “__kprobes” and open-code memcpy. > > > >>> > > > >>> Does it sound reasonable? > > > >> > > > >> Doesn't __text_poke() as implemented in the proposed patch use a > > > >> couple other kernel functions, too? Like switch_mm_irqs_off() and > > > >> pte_clear() (which can be a call into a separate function on paravirt > > > >> kernels)? > > > > > > > > I will move the pte_clear() to be done after the poking mm was unloaded. > > > > Give me a few minutes to send a sketch of what I think should be done. > > > > > > Err.. You are right, I don’t see an easy way of preventing a kprobe from > > > being set on switch_mm_irqs_off(), and open-coding this monster is too ugly. > > > > > > The reasonable solution seems to me as taking all the relevant pieces of > > > code (and data) that might be used during text-poking and encapsulating them, so they > > > will be set in a memory area which cannot be kprobe'd. This can also be > > > useful to write-protect data structures of code that calls text_poke(), > > > e.g., static-keys. It can also protect data on that stack that is used > > > during text_poke() from being overwritten from another core. > > > > > > This solution is somewhat similar to Igor Stoppa’s idea of using “enclaves” > > > when doing write-rarely operations. > > > > > > Right now, I think that text_poke() will keep being susceptible to such > > > an attack, unless you have a better suggestion. > > > > A relatively simple approach might be to teach BPF not to run kprobe > > programs and such in contexts where current->mm isn't the active mm? > > Maybe using nmi_uaccess_okay(), or something like that? It looks like > > perf_callchain_user() also already uses that. Except that a lot of > > this code is x86-specific... > > This sounds like exactly the right solution. If you're running from > some unknown context (like NMI or tracing), then you should check > nmi_uaccess_okay(). I think we should just promote that to be a > non-arch-specific function (that returns true by default) and check it > the relevant bpf_probe_xyz() functions. This treat may also need for my work, like probe_user_read() we should fail if nmi_uaccess_okay(). Thank you, > > Alexei, does that seem reasonable? -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 19:27 ` Alexei Starovoitov 2019-02-22 19:30 ` Steven Rostedt @ 2019-02-22 21:20 ` Linus Torvalds 2019-02-22 21:38 ` David Miller 2019-02-26 15:24 ` Joel Fernandes 2 siblings, 1 reply; 85+ messages in thread From: Linus Torvalds @ 2019-02-22 21:20 UTC (permalink / raw) To: Alexei Starovoitov Cc: Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, Feb 22, 2019 at 11:27 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > and users pass both user and kernel addresses into it and expect > that the helper will actually try to read from that address. As mentioned earlier in the thread, that's actually fundamentally broken. There are architectures that have physically separate address spaces, with the same pointer value in both kernel and user space. They are rare, but they exist. At least sparc32 and the old 4G:4G split x86. So a pointer really should always unambiguously always be explicitly _either_ a kernel pointer, or a user pointer. You can't have "this is a pointer", and then try to figure it out by looking at the value. That may happen to work on x86-64, but it's literally a "happen to work on the most common architectures", not a design thing. Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 21:20 ` Linus Torvalds @ 2019-02-22 21:38 ` David Miller 2019-02-22 21:59 ` Linus Torvalds 0 siblings, 1 reply; 85+ messages in thread From: David Miller @ 2019-02-22 21:38 UTC (permalink / raw) To: torvalds Cc: alexei.starovoitov, mhiramat, rostedt, luto, linux-kernel, mingo, akpm, stable, changbin.du, jannh, keescook, luto, daniel, netdev, bpf From: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri, 22 Feb 2019 13:20:58 -0800 > On Fri, Feb 22, 2019 at 11:27 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: >> >> On bpf side the bpf_probe_read() helper just calls probe_kernel_read() >> and users pass both user and kernel addresses into it and expect >> that the helper will actually try to read from that address. > > As mentioned earlier in the thread, that's actually fundamentally broken. > > There are architectures that have physically separate address spaces, > with the same pointer value in both kernel and user space. > > They are rare, but they exist. At least sparc32 and the old 4G:4G split x86. And sparc64. > So a pointer really should always unambiguously always be explicitly > _either_ a kernel pointer, or a user pointer. You can't have "this is > a pointer", and then try to figure it out by looking at the value. > That may happen to work on x86-64, but it's literally a "happen to > work on the most common architectures", not a design thing. Don't be surprised if we see more separation like this in the future too. So it's not a smart thing to code against even if you can discount all of the examples Linus gives above. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 21:38 ` David Miller @ 2019-02-22 21:59 ` Linus Torvalds 2019-02-22 22:51 ` Alexei Starovoitov 2019-02-26 3:57 ` Christoph Hellwig 0 siblings, 2 replies; 85+ messages in thread From: Linus Torvalds @ 2019-02-22 21:59 UTC (permalink / raw) To: David Miller Cc: Alexei Starovoitov, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, Feb 22, 2019 at 1:38 PM David Miller <davem@davemloft.net> wrote: > > Don't be surprised if we see more separation like this in the future too. Yes, with the whole meltdown fiasco, there's actually more pressure to add more support for separation of kernel/user address spaces. As Andy pointed out, it's been discussed as a future wish-list for x86-64 too. But yeah, right now the *common* architectures all distinguish kernel and user space by pointers (ie x86-64, arm64 and powerpc). Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 21:59 ` Linus Torvalds @ 2019-02-22 22:51 ` Alexei Starovoitov 2019-02-22 23:11 ` Jann Horn 2019-02-22 23:16 ` Linus Torvalds 2019-02-26 3:57 ` Christoph Hellwig 1 sibling, 2 replies; 85+ messages in thread From: Alexei Starovoitov @ 2019-02-22 22:51 UTC (permalink / raw) To: Linus Torvalds Cc: David Miller, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, Feb 22, 2019 at 01:59:10PM -0800, Linus Torvalds wrote: > On Fri, Feb 22, 2019 at 1:38 PM David Miller <davem@davemloft.net> wrote: > > > > Don't be surprised if we see more separation like this in the future too. > > Yes, with the whole meltdown fiasco, there's actually more pressure to > add more support for separation of kernel/user address spaces. As Andy > pointed out, it's been discussed as a future wish-list for x86-64 too. > > But yeah, right now the *common* architectures all distinguish kernel > and user space by pointers (ie x86-64, arm64 and powerpc). That's all fine. I'm missing rationale for making probe_kernel_read() fail on user addresses. What is fundamentally wrong with a function probe_any_address_read() ? For context, typical bpf kprobe program looks like this: #define probe_read(P) \ ({typeof(P) val = 0; bpf_probe_read(&val, sizeof(val), &P); val;}) SEC("kprobe/__set_task_comm") int bpf_prog(struct pt_regs *ctx) { struct signal_struct *signal; struct task_struct *tsk; char oldcomm[16] = {}; char newcomm[16] = {}; u16 oom_score_adj; u32 pid; tsk = (void *)PT_REGS_PARM1(ctx); pid = probe_read(tsk->pid); bpf_probe_read(oldcomm, sizeof(oldcomm), &tsk->comm); bpf_probe_read(newcomm, sizeof(newcomm), (void *)PT_REGS_PARM2(ctx)); signal = probe_read(tsk->signal); oom_score_adj = probe_read(signal->oom_score_adj); ... } where PT_REGS_PARMx macros are defined per architecture. On x86 it's #define PT_REGS_PARM1(x) ((x)->di) The program writer has to know the meaning of function arguments. In this example they need to know that __set_task_comm is defined as void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) in the kernel. Right now these programs just call bpf_probe_read() on whatever data they need to access and not differentiating whether it's user or kernel. One idea we discussed is to split bpf_probe_read() into kernel_read and user_read helpers, but in the BPF verifier we cannot determine which address space the program wants to access. The prog writer needs to manually analyze the program to use correct one. But mistakes are possible and cannot be fatal. On the kernel side we have to be safe. Both probe_kernel_read and probe_user_read must not panic if a pointer from wrong address space was passed. Hence my preference is to keep probe_kernel_read as "try read any address". The function can be renamed to indicate so. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 22:51 ` Alexei Starovoitov @ 2019-02-22 23:11 ` Jann Horn 2019-02-22 23:16 ` David Miller 2019-02-22 23:16 ` Linus Torvalds 1 sibling, 1 reply; 85+ messages in thread From: Jann Horn @ 2019-02-22 23:11 UTC (permalink / raw) To: Alexei Starovoitov Cc: Linus Torvalds, David Miller, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, Feb 22, 2019 at 11:51 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Feb 22, 2019 at 01:59:10PM -0800, Linus Torvalds wrote: > > On Fri, Feb 22, 2019 at 1:38 PM David Miller <davem@davemloft.net> wrote: > > > > > > Don't be surprised if we see more separation like this in the future too. > > > > Yes, with the whole meltdown fiasco, there's actually more pressure to > > add more support for separation of kernel/user address spaces. As Andy > > pointed out, it's been discussed as a future wish-list for x86-64 too. > > > > But yeah, right now the *common* architectures all distinguish kernel > > and user space by pointers (ie x86-64, arm64 and powerpc). > > That's all fine. I'm missing rationale for making probe_kernel_read() > fail on user addresses. > What is fundamentally wrong with a function probe_any_address_read() ? I think what Linus is saying is: There are some scenarios (like a system with the old 4G/4G X86 patch) where *the same* address can refer to two different pieces of memory, depending on whether you interpret it as a kernel pointer or a user pointer. So for example, if your BPF program tries to read tsk->comm, it works, but if the BPF program then tries to read from PT_REGS_PARM2(ctx), it's going to actually interpret the userspace address as a kernel address and read completely different memory. On top of that, from the security angle, this means that if a user passes a kernel pointer into a syscall, they can trick a tracing BPF program into looking at random kernel memory instead of the user's memory. That may or may not be problematic, depending on what you do afterwards with the data you've read. (For example, if this is a service that collects performance data and then saves it to some world-readable location on disk because the data it is collecting (including comm strings) isn't supposed to be sensitive, you might have a problem.) > For context, typical bpf kprobe program looks like this: > #define probe_read(P) \ > ({typeof(P) val = 0; bpf_probe_read(&val, sizeof(val), &P); val;}) > SEC("kprobe/__set_task_comm") > int bpf_prog(struct pt_regs *ctx) > { > struct signal_struct *signal; > struct task_struct *tsk; > char oldcomm[16] = {}; > char newcomm[16] = {}; > u16 oom_score_adj; > u32 pid; > > tsk = (void *)PT_REGS_PARM1(ctx); > pid = probe_read(tsk->pid); > bpf_probe_read(oldcomm, sizeof(oldcomm), &tsk->comm); > bpf_probe_read(newcomm, sizeof(newcomm), (void *)PT_REGS_PARM2(ctx)); > signal = probe_read(tsk->signal); > oom_score_adj = probe_read(signal->oom_score_adj); > ... > } > > where PT_REGS_PARMx macros are defined per architecture. > On x86 it's #define PT_REGS_PARM1(x) ((x)->di) > > The program writer has to know the meaning of function arguments. > In this example they need to know that __set_task_comm is defined as > void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) in the kernel. > > Right now these programs just call bpf_probe_read() on whatever data > they need to access and not differentiating whether it's user or kernel. > > One idea we discussed is to split bpf_probe_read() into kernel_read and user_read > helpers, but in the BPF verifier we cannot determine which address space > the program wants to access. The prog writer needs to manually analyze the program > to use correct one. But mistakes are possible and cannot be fatal. > On the kernel side we have to be safe. > Both probe_kernel_read and probe_user_read must not panic if a pointer > from wrong address space was passed. > > Hence my preference is to keep probe_kernel_read as "try read any address". > The function can be renamed to indicate so. > ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 23:11 ` Jann Horn @ 2019-02-22 23:16 ` David Miller 0 siblings, 0 replies; 85+ messages in thread From: David Miller @ 2019-02-22 23:16 UTC (permalink / raw) To: jannh Cc: alexei.starovoitov, torvalds, mhiramat, rostedt, luto, linux-kernel, mingo, akpm, stable, changbin.du, keescook, luto, daniel, netdev, bpf From: Jann Horn <jannh@google.com> Date: Sat, 23 Feb 2019 00:11:58 +0100 > I think what Linus is saying is: There are some scenarios (like a > system with the old 4G/4G X86 patch) where *the same* address can > refer to two different pieces of memory, depending on whether you > interpret it as a kernel pointer or a user pointer. Exactly. On sparc64 the kernel is mapped exactly at the same virtual addresses as userspace processes usually are mapped, even 32-bit ones. The difference is the MMU context only. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 22:51 ` Alexei Starovoitov 2019-02-22 23:11 ` Jann Horn @ 2019-02-22 23:16 ` Linus Torvalds 2019-02-22 23:56 ` Alexei Starovoitov 1 sibling, 1 reply; 85+ messages in thread From: Linus Torvalds @ 2019-02-22 23:16 UTC (permalink / raw) To: Alexei Starovoitov Cc: David Miller, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, Feb 22, 2019 at 2:51 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > That's all fine. I'm missing rationale for making probe_kernel_read() > fail on user addresses. Because it already WON'T WORK in general! > What is fundamentally wrong with a function probe_any_address_read() ? What part of "the same pointer value can be a user address and a kernel address" is not getting through? The user address space and the kernel address space have separate page tables on some architectures. We used to avoid it on x86, because switching address spaces was expensive, but even on x86 some vendors did it on 32-bit simply to get 4GB of user (and kernel) address space. And now we end up doing it anyway just because of meltdown. So a kernel pointer value of 0x12345678 could be a value kernel pointer pointing to some random kmalloc'ed kernel memory, and a user pointer value of 0x12345678 could be a valid _user_ pointer pointing to some user mapping. See? If you access a user pointer, you need to use a user accessor function (eg "get_user()"), while if you access a kernel pointer you need to just dereference it directly (unless you can't trust it, in which case you need to use a _different_ accessor function). The fact that user and kernel pointers happen to be distinct on x86-64 (right now) is just a random implementation detail. Really. I didn't realize how many people seem to have been confused about this. But it's always been true. It's just that the common architectures have had that "one single address space for both kernel and user pointers" in practice. In fact, the *very* first kernel version had separate address spaces for kernel and user mode even on x86 (using segments, not paging). So it has literally been true since day one in Linux that a kernel address can be indistinguishable from a user address from a pure value standpoint. Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 23:16 ` Linus Torvalds @ 2019-02-22 23:56 ` Alexei Starovoitov 2019-02-23 0:08 ` Linus Torvalds 2019-02-23 4:51 ` Masami Hiramatsu 0 siblings, 2 replies; 85+ messages in thread From: Alexei Starovoitov @ 2019-02-22 23:56 UTC (permalink / raw) To: Linus Torvalds Cc: David Miller, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, Feb 22, 2019 at 03:16:35PM -0800, Linus Torvalds wrote: > > So a kernel pointer value of 0x12345678 could be a value kernel > pointer pointing to some random kmalloc'ed kernel memory, and a user > pointer value of 0x12345678 could be a valid _user_ pointer pointing > to some user mapping. > > See? > > If you access a user pointer, you need to use a user accessor function > (eg "get_user()"), while if you access a kernel pointer you need to > just dereference it directly (unless you can't trust it, in which case > you need to use a _different_ accessor function). that was clear already. Reading 0x12345678 via probe_kernel_read can return valid value and via get_user() can return another valid value on _some_ architectures. > The fact that user and kernel pointers happen to be distinct on x86-64 > (right now) is just a random implementation detail. yes and my point that people already rely on this implementation detail. Say we implement int bpf_probe_read(void *val, void *unsafe_ptr) { if (probe_kernel_read(val, unsafe_ptr) == OK) { return 0; } else (get_user(val, unsafe_ptr) == OK) { return 0; } else { *val = 0; return -EFAULT; } } It will preserve existing bpf_probe_read() behavior on x86. If x86 implementation changes tomorrow then progs that read user addresses may start failing randomly because first probe_kernel_read() will be returning random values from kernel memory and that's no good, but at least we won't be breaking them today, so we have time to introduce bpf_user_read and bpf_kernel_read and folks have time to adopt them. Imo that's much better than making current bpf_probe_read() fail on user addresses today and not providing a non disruptive path forward. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 23:56 ` Alexei Starovoitov @ 2019-02-23 0:08 ` Linus Torvalds 2019-02-23 2:28 ` Alexei Starovoitov 2019-02-23 4:51 ` Masami Hiramatsu 1 sibling, 1 reply; 85+ messages in thread From: Linus Torvalds @ 2019-02-23 0:08 UTC (permalink / raw) To: Alexei Starovoitov Cc: David Miller, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, Feb 22, 2019 at 3:56 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > It will preserve existing bpf_probe_read() behavior on x86. ... but that's the worst possible situation. It appears that people haven't understood that kernel and user addresses are distinct, and may have written programs that are fundamentally buggy. And we _want_ to make it clear that they are buggy on x86-64, exactly because x86-64 is the one that gets the most testing - by far. So if x86-64 continues working for buggy programs, then that only means that those bugs never get fixed. It would be much better to try to get those things fixed, and make the x86-64 implementation stricter, exactly so that people end up _realizing_ that they can't just think "a pointer is a pointer, and the context doesn't matter". From a pure functional safety standpoint, I thought bpf already knew what kind of a pointer it had? Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-23 0:08 ` Linus Torvalds @ 2019-02-23 2:28 ` Alexei Starovoitov 2019-02-23 2:32 ` Linus Torvalds 2019-02-23 3:02 ` Steven Rostedt 0 siblings, 2 replies; 85+ messages in thread From: Alexei Starovoitov @ 2019-02-23 2:28 UTC (permalink / raw) To: Linus Torvalds Cc: David Miller, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, Feb 22, 2019 at 04:08:59PM -0800, Linus Torvalds wrote: > On Fri, Feb 22, 2019 at 3:56 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > It will preserve existing bpf_probe_read() behavior on x86. > > ... but that's the worst possible situation. > > It appears that people haven't understood that kernel and user > addresses are distinct, and may have written programs that are > fundamentally buggy. > > And we _want_ to make it clear that they are buggy on x86-64, exactly > because x86-64 is the one that gets the most testing - by far. > > So if x86-64 continues working for buggy programs, then that only > means that those bugs never get fixed. > > It would be much better to try to get those things fixed, and make the > x86-64 implementation stricter, exactly so that people end up > _realizing_ that they can't just think "a pointer is a pointer, and > the context doesn't matter". > > From a pure functional safety standpoint, I thought bpf already knew > what kind of a pointer it had? when bpf verifier knows the type of pointer it allows direct access to it. That's the case for skb, socket, packet data, hash maps, arrays, stack, etc. Networking progs cannot call bpf_probe_read(). It's available to tracing progs only and their goal is to walk the kernel and user memory with addresses that cannot be statically verified at program load time. We are working on adding type information (BTF) to vmlinux. Soon we'll be able to tell the type of every kernel function argument. Right now arg1 and arg2 in a kprobed function are just u64 pt_regs->di, si. Soon we'll be able to precisely identify their C type. I completely agree on the direction that x86 is the architecture that sets an example and users need to learn the difference in pointers. I only disagree on timing. Right now users don't have an alternative. In our repo I counted ~400 calls to bpf_probe_read and about 10 times more 'indirect' calls. What's happening with 'indirect' calls is BCC toolchain using clang to automatically replace struct_a->field_foo access with bpf_probe_read(struct_a + offsetof(typeof(struct_a), field_foo)); If we had __user vs __kernel annotation available to clang we could have taught BCC to replace this '->' dereference with appropriate kernel vs user helper. Also we need to teach GCC to recognize __user and store into dwarf, so we can take it further into BTF and verify later. Also I think disallowing bpf_probe_read() to read user pointer will not make a desired teaching effect. It will only cause painful debugging to folks when their progs will stop working. It's better to remove bpf_probe_read() completely. imo the process of teaching the users of kernel vs user pointer difference needs to be gradual. First we introduce bpf_probe_kernel_read and bpf_probe_user_read and introduce clang/gcc tooling to catch the mistakes. Going over this 400+ places and manually grepping kernel sources for __user keyword is not a great proposal if we want to keep those users. Once we have this working we can remove bpf_probe_read() altogether. Rejecting bpf prog at load time is a clear signal that user has to fix it (instead of changing run-time behavior). When the verifier gets even smarter it could potentially replace prob_read with probe_kernel_read and probe_user_read when it has that type info. imo this kernel release should finish as-is and in the next cycle we can change probe_kernel_read() to reject user address, have temporary workaround in bpf_probe_read() with probe_kernel_read+get_user hack, introduce new bpf helpers, new tooling and eventually remove buggy bpf_probe_read. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-23 2:28 ` Alexei Starovoitov @ 2019-02-23 2:32 ` Linus Torvalds 2019-02-23 3:02 ` Steven Rostedt 1 sibling, 0 replies; 85+ messages in thread From: Linus Torvalds @ 2019-02-23 2:32 UTC (permalink / raw) To: Alexei Starovoitov Cc: David Miller, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, Feb 22, 2019 at 6:29 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > imo this kernel release should finish as-is and in the next cycle we can > change probe_kernel_read() to reject user address [..] Absolutely. Nothing is going to change right now for 5.0, which is imminent. It's really a "long-term we really need to fix this", where the only question is how soon "long-term" is. Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-23 2:28 ` Alexei Starovoitov 2019-02-23 2:32 ` Linus Torvalds @ 2019-02-23 3:02 ` Steven Rostedt 1 sibling, 0 replies; 85+ messages in thread From: Steven Rostedt @ 2019-02-23 3:02 UTC (permalink / raw) To: Alexei Starovoitov Cc: Linus Torvalds, David Miller, Masami Hiramatsu, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, 22 Feb 2019 18:28:53 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > First we introduce bpf_probe_kernel_read and bpf_probe_user_read and > introduce clang/gcc tooling to catch the mistakes. > Going over this 400+ places and manually grepping kernel sources > for __user keyword is not a great proposal if we want to keep those users. > Once we have this working we can remove bpf_probe_read() altogether. > Rejecting bpf prog at load time is a clear signal that user has to fix it > (instead of changing run-time behavior). > When the verifier gets even smarter it could potentially replace prob_read > with probe_kernel_read and probe_user_read when it has that type info. I was about to suggest this approach. Document that bpf_probe_read() is known to be buggy and will be deprecated in the future, and that all new bpf scripts should start using bpf_probe_kernel/user_read() instead (after they have been implemented of course). And give time for people to fix their current scripts. Perhaps in the near future, trigger some kind of warning for users that use bpf_probe_read(). -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 23:56 ` Alexei Starovoitov 2019-02-23 0:08 ` Linus Torvalds @ 2019-02-23 4:51 ` Masami Hiramatsu 1 sibling, 0 replies; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-23 4:51 UTC (permalink / raw) To: Alexei Starovoitov Cc: Linus Torvalds, David Miller, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, 22 Feb 2019 15:56:20 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Fri, Feb 22, 2019 at 03:16:35PM -0800, Linus Torvalds wrote: > > > > So a kernel pointer value of 0x12345678 could be a value kernel > > pointer pointing to some random kmalloc'ed kernel memory, and a user > > pointer value of 0x12345678 could be a valid _user_ pointer pointing > > to some user mapping. > > > > See? > > > > If you access a user pointer, you need to use a user accessor function > > (eg "get_user()"), while if you access a kernel pointer you need to > > just dereference it directly (unless you can't trust it, in which case > > you need to use a _different_ accessor function). > > that was clear already. > Reading 0x12345678 via probe_kernel_read can return valid value > and via get_user() can return another valid value on _some_ architectures. > > > The fact that user and kernel pointers happen to be distinct on x86-64 > > (right now) is just a random implementation detail. > > yes and my point that people already rely on this implementation detail. > Say we implement > int bpf_probe_read(void *val, void *unsafe_ptr) > { > if (probe_kernel_read(val, unsafe_ptr) == OK) { > return 0; > } else (get_user(val, unsafe_ptr) == OK) { > return 0; > } else { > *val = 0; > return -EFAULT; > } > } Note that we can not use get_user() form kprobe handler. If you use it, you have to prepare fault_handler() and make bpf itself can be aborted. So, maybe you can use probe_user_read(). Hmm, however, it still doesn't work correctly on "some" architecture, since whether a pointer (address) points user-space or kernel-space depends on the context. In kprobe/bpf, the context means where you put the probe and which pointer you record. I think only "__user" tag tells us which one is user-space. But unfortunately, that "__user" tag is only for compiler or checker, not for runtime binary. Such useful attribute goes away when we execute it. So, even if we introduce "ustring", ftrace/perf users has to decide to use it by themselves. As far as I know, DWARF(debuginfo) also doesn't have that attribute. So perf-probe can not help it from debuginfo. (Maybe if we introduce C parser, it might be detected...) > It will preserve existing bpf_probe_read() behavior on x86. > If x86 implementation changes tomorrow then progs that read user > addresses may start failing randomly because first probe_kernel_read() > will be returning random values from kernel memory and that's no good, > but at least we won't be breaking them today, so we have time to > introduce bpf_user_read and bpf_kernel_read and folks have time to adopt them. I see. I think bpf also has to introduce new bpf_probe_read_user() and keep bpf_probe_read() for kernel dataa only. > Imo that's much better than making current bpf_probe_read() fail > on user addresses today and not providing a non disruptive path forward. Agreed. Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 21:59 ` Linus Torvalds 2019-02-22 22:51 ` Alexei Starovoitov @ 2019-02-26 3:57 ` Christoph Hellwig 1 sibling, 0 replies; 85+ messages in thread From: Christoph Hellwig @ 2019-02-26 3:57 UTC (permalink / raw) To: Linus Torvalds Cc: David Miller, Alexei Starovoitov, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andrew Lutomirski, Daniel Borkmann, Netdev, bpf On Fri, Feb 22, 2019 at 01:59:10PM -0800, Linus Torvalds wrote: > On Fri, Feb 22, 2019 at 1:38 PM David Miller <davem@davemloft.net> wrote: > > > > Don't be surprised if we see more separation like this in the future too. > > Yes, with the whole meltdown fiasco, there's actually more pressure to > add more support for separation of kernel/user address spaces. As Andy > pointed out, it's been discussed as a future wish-list for x86-64 too. S/390 is another example. I've also proposed a RISC-V extension for it, including prototypes for Rocketchip and Qemu, and a Linux kernel support, but it didn't go any way. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 19:27 ` Alexei Starovoitov 2019-02-22 19:30 ` Steven Rostedt 2019-02-22 21:20 ` Linus Torvalds @ 2019-02-26 15:24 ` Joel Fernandes 2019-02-28 12:29 ` Masami Hiramatsu 2 siblings, 1 reply; 85+ messages in thread From: Joel Fernandes @ 2019-02-26 15:24 UTC (permalink / raw) To: Alexei Starovoitov Cc: Linus Torvalds, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, will.deacon, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, daniel, netdev, bpf On Fri, Feb 22, 2019 at 11:27:05AM -0800, Alexei Starovoitov wrote: > On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > > > > Then we should still probably fix up "__probe_kernel_read()" to not > > allow user accesses. The easiest way to do that is actually likely to > > use the "unsafe_get_user()" functions *without* doing a > > uaccess_begin(), which will mean that modern CPU's will simply fault > > on a kernel access to user space. > > On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > and users pass both user and kernel addresses into it and expect > that the helper will actually try to read from that address. Slightly related and FWIW, BCC's eBPF-based opensnoop tool [1] installs a kprobe on do_sys_open to monitor calls to the open syscall globally. do_sys_open() has prototype: long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode); This causes a "blank" filename to be displayed by opensnoop when I run it on my Pixel 3 (arm64), possibly because this is a user pointer. However, it works fine on x86-64. So it seems to me that on arm64, reading user pointers directly still doesn't work even if there is a distinction between user/kernel addresses. In that case reading the user pointer using user accessors (possibly using bpf_probe_user_read helper) should be needed to fix this issue (as Yonghong also privately discussed with me). [1] https://github.com/iovisor/bcc/blob/master/tools/opensnoop.py#L140 thanks! - Joel > > If __probe_kernel_read will suddenly start failing on all user addresses > it will break the expectations. > How do we solve it in bpf_probe_read? > Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte > in the loop? > That's doable, but people already complain that bpf_probe_read() is slow > and shows up in their perf report. > ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-26 15:24 ` Joel Fernandes @ 2019-02-28 12:29 ` Masami Hiramatsu 2019-02-28 15:18 ` Joel Fernandes 0 siblings, 1 reply; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-28 12:29 UTC (permalink / raw) To: Joel Fernandes Cc: Alexei Starovoitov, Linus Torvalds, Masami Hiramatsu, Steven Rostedt, Andy Lutomirski, will.deacon, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, daniel, netdev, bpf On Tue, 26 Feb 2019 10:24:47 -0500 Joel Fernandes <joel@joelfernandes.org> wrote: > On Fri, Feb 22, 2019 at 11:27:05AM -0800, Alexei Starovoitov wrote: > > On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > > > > > > Then we should still probably fix up "__probe_kernel_read()" to not > > > allow user accesses. The easiest way to do that is actually likely to > > > use the "unsafe_get_user()" functions *without* doing a > > > uaccess_begin(), which will mean that modern CPU's will simply fault > > > on a kernel access to user space. > > > > On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > > and users pass both user and kernel addresses into it and expect > > that the helper will actually try to read from that address. > > Slightly related and FWIW, BCC's eBPF-based opensnoop tool [1] installs a > kprobe on do_sys_open to monitor calls to the open syscall globally. > > do_sys_open() has prototype: > > long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode); > > This causes a "blank" filename to be displayed by opensnoop when I run it on > my Pixel 3 (arm64), possibly because this is a user pointer. However, it > works fine on x86-64. > > So it seems to me that on arm64, reading user pointers directly still doesn't > work even if there is a distinction between user/kernel addresses. In that > case reading the user pointer using user accessors (possibly using > bpf_probe_user_read helper) should be needed to fix this issue (as Yonghong > also privately discussed with me). OK, it sounds like the same issue. Please add a bpf_user_read() and use it for __user pointer. Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-28 12:29 ` Masami Hiramatsu @ 2019-02-28 15:18 ` Joel Fernandes 0 siblings, 0 replies; 85+ messages in thread From: Joel Fernandes @ 2019-02-28 15:18 UTC (permalink / raw) To: Masami Hiramatsu Cc: Alexei Starovoitov, Linus Torvalds, Steven Rostedt, Andy Lutomirski, will.deacon, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski, daniel, netdev, bpf, yhs On Thu, Feb 28, 2019 at 09:29:13PM +0900, Masami Hiramatsu wrote: > On Tue, 26 Feb 2019 10:24:47 -0500 > Joel Fernandes <joel@joelfernandes.org> wrote: > > > On Fri, Feb 22, 2019 at 11:27:05AM -0800, Alexei Starovoitov wrote: > > > On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote: > > > > > > > > Then we should still probably fix up "__probe_kernel_read()" to not > > > > allow user accesses. The easiest way to do that is actually likely to > > > > use the "unsafe_get_user()" functions *without* doing a > > > > uaccess_begin(), which will mean that modern CPU's will simply fault > > > > on a kernel access to user space. > > > > > > On bpf side the bpf_probe_read() helper just calls probe_kernel_read() > > > and users pass both user and kernel addresses into it and expect > > > that the helper will actually try to read from that address. > > > > Slightly related and FWIW, BCC's eBPF-based opensnoop tool [1] installs a > > kprobe on do_sys_open to monitor calls to the open syscall globally. > > > > do_sys_open() has prototype: > > > > long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode); > > > > This causes a "blank" filename to be displayed by opensnoop when I run it on > > my Pixel 3 (arm64), possibly because this is a user pointer. However, it > > works fine on x86-64. > > > > So it seems to me that on arm64, reading user pointers directly still doesn't > > work even if there is a distinction between user/kernel addresses. In that > > case reading the user pointer using user accessors (possibly using > > bpf_probe_user_read helper) should be needed to fix this issue (as Yonghong > > also privately discussed with me). > > OK, it sounds like the same issue. Please add a bpf_user_read() and use it > for __user pointer. CC'd Yonghong who said eariler to me he would add it, but I could add it too if he wants me to. thanks, - Joel ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-22 17:43 ` Linus Torvalds 2019-02-22 17:48 ` Andy Lutomirski 2019-02-22 19:27 ` Alexei Starovoitov @ 2019-02-23 3:47 ` Masami Hiramatsu 2019-02-24 0:44 ` Steven Rostedt 2 siblings, 1 reply; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-23 3:47 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Fri, 22 Feb 2019 09:43:14 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Feb 22, 2019 at 12:35 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > Or, can we do this? > > > > long __probe_user_read(void *dst, const void *src, size_t size) > > { > > Add a > > if (!access_ok(src, size)) > ret = -EFAULT; > else { > .. do the pagefault_disable() etc .. > } Since kprobes handler runs in IRQ context, we can not use access_ok() in it. (only on x86 + CONFIG_DEBUG_ATOMIC_SLEEP=y) In arch/x86/include/asm/uaccess.h: #define access_ok(addr, size) \ ({ \ WARN_ON_IN_IRQ(); \ likely(!__range_not_ok(addr, size, user_addr_max())); \ }) Do we need acccess_ok_inatomic()? BTW, it seems a bit strange that this WARN_ON_IN_IRQ() is only in x86 access_ok() implementation, since CONFIG_DEBUG_ATOMIC_SLEEP(which defines WARN_ON_IN_IRQ) doesn't depend on x86, and access_ok() is widely used in kernel. I think it would be better that each arch provides __access_ok() and include/linux/uaccess.h provides access_ok() with WARN_ON_IN_IRQ(). > to after the "set_fs()", and it looks good to me. Make it clear that > yes, this works _only_ for user reads. > > Adn that makes all the games with "kernel_uaccess_faults_ok" > pointless, so you can just remove them. OK. > > (note that the "access_ok()" has to come after we've done "set_fs()", > because it takes the address limit from that). > > Also, since normally we'd expect that we already have USER_DS, it > might be worthwhile to do this with a wrapper, something along the > lines of > > mm_segment_t old_fs = get_fs(); > > if (segment_eq(old_fs, USER_DS)) > return __normal_probe_user_read(); > set_fs(USER_DS); > ret = __normal_probe_user_read(); > set_fs(old_fs); > return ret; > > and have that __normal_probe_user_read() just do > > if (!access_ok(src, size)) > return -EFAULT; > pagefault_disable(); > ret = __copy_from_user_inatomic(dst, ...); > pagefault_enable(); > return ret ? -EFAULT : 0; > > which looks more obvious. OK. > > Also, I would suggest that you just make the argument type be "const > void __user *", since the whole point is that this takes a user > pointer, and nothing else. Ah, right. > Then we should still probably fix up "__probe_kernel_read()" to not > allow user accesses. The easiest way to do that is actually likely to > use the "unsafe_get_user()" functions *without* doing a > uaccess_begin(), which will mean that modern CPU's will simply fault > on a kernel access to user space. Or, use __chk_user_ptr(ptr) to check it? Thank you, > > The nice thing about that is that usually developers will have access > to exactly those modern boxes, so the people who notice that it > doesn't work are the right people. > > Alternatively, we should just make it be architecture-specific, so > that architectures can decide "this address cannot be a kernel > address" and refuse to do it. > > Linus -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-23 3:47 ` Masami Hiramatsu @ 2019-02-24 0:44 ` Steven Rostedt 2019-02-24 4:38 ` Andy Lutomirski 0 siblings, 1 reply; 85+ messages in thread From: Steven Rostedt @ 2019-02-24 0:44 UTC (permalink / raw) To: Masami Hiramatsu Cc: Linus Torvalds, Andy Lutomirski, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Sat, 23 Feb 2019 12:47:46 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > Since kprobes handler runs in IRQ context, we can not use access_ok() in it. > (only on x86 + CONFIG_DEBUG_ATOMIC_SLEEP=y) Is it really IRQ context or exception context? That is, one (interrupts) happen for any task, but exceptions happen because of the software that is executed (like a breakpoint). Although you can have a kprobe trigger in an interrupt handler (where user access wouldn't make sense anyway). But there should be no problem with user access from an exception handler. -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-24 0:44 ` Steven Rostedt @ 2019-02-24 4:38 ` Andy Lutomirski 2019-02-24 15:17 ` Masami Hiramatsu 2019-02-25 16:48 ` Kees Cook 0 siblings, 2 replies; 85+ messages in thread From: Andy Lutomirski @ 2019-02-24 4:38 UTC (permalink / raw) To: Steven Rostedt Cc: Masami Hiramatsu, Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski On Sat, Feb 23, 2019 at 4:44 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Sat, 23 Feb 2019 12:47:46 +0900 > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > Since kprobes handler runs in IRQ context, we can not use access_ok() in it. > > (only on x86 + CONFIG_DEBUG_ATOMIC_SLEEP=y) > > Is it really IRQ context or exception context? That is, one > (interrupts) happen for any task, but exceptions happen because of the > software that is executed (like a breakpoint). Although you can have a > kprobe trigger in an interrupt handler (where user access wouldn't make > sense anyway). But there should be no problem with user access from an > exception handler. > Can we just get rid of this might_sleep()? access_ok() doesn't sleep as far as I know. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-24 4:38 ` Andy Lutomirski @ 2019-02-24 15:17 ` Masami Hiramatsu 2019-02-24 17:26 ` Linus Torvalds 2019-02-25 16:48 ` Kees Cook 1 sibling, 1 reply; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-24 15:17 UTC (permalink / raw) To: Andy Lutomirski Cc: Steven Rostedt, Masami Hiramatsu, Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Peter Zijlstra On Sat, 23 Feb 2019 20:38:03 -0800 Andy Lutomirski <luto@kernel.org> wrote: > On Sat, Feb 23, 2019 at 4:44 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Sat, 23 Feb 2019 12:47:46 +0900 > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > Since kprobes handler runs in IRQ context, we can not use access_ok() in it. > > > (only on x86 + CONFIG_DEBUG_ATOMIC_SLEEP=y) > > > > Is it really IRQ context or exception context? That is, one > > (interrupts) happen for any task, but exceptions happen because of the > > software that is executed (like a breakpoint). Although you can have a > > kprobe trigger in an interrupt handler (where user access wouldn't make > > sense anyway). But there should be no problem with user access from an > > exception handler. > > > > Can we just get rid of this might_sleep()? access_ok() doesn't sleep > as far as I know. Hmm, which might_sleep() would you pointed? What I talked was a WARN_ON_ONCE(!in_task()) in access_ok() on x86 (only!), and in_task() just checks preempt_count. I guess PeterZ assumed that access_ok() is used only with user space access APIs (e.g. copy_from_user) which can cause page-fault and locks mm (and might sleep :)), but now we are trying to use access_ok() with new functions which disables page-fault and just return -EFAULT. Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-24 15:17 ` Masami Hiramatsu @ 2019-02-24 17:26 ` Linus Torvalds 2019-02-25 2:40 ` Masami Hiramatsu 2019-02-25 8:33 ` Peter Zijlstra 0 siblings, 2 replies; 85+ messages in thread From: Linus Torvalds @ 2019-02-24 17:26 UTC (permalink / raw) To: Masami Hiramatsu Cc: Andy Lutomirski, Steven Rostedt, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Peter Zijlstra On Sun, Feb 24, 2019 at 7:18 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > On Sat, 23 Feb 2019 20:38:03 -0800 > Andy Lutomirski <luto@kernel.org> wrote: > > > > Can we just get rid of this might_sleep()? access_ok() doesn't sleep > > as far as I know. > > Hmm, which might_sleep() would you pointed? What I talked was a > WARN_ON_ONCE(!in_task()) in access_ok() on x86 (only!), and in_task() just > checks preempt_count. So the in_task() check does kind of make sense. Using "access_ok()" outside of task context is certainly an odd thing, for several reasons. The main one being simply that outside of task context, the whole "which task" question is open, and you don't know if the task is the active one, and so it's not clear if whatever task you interrupt might have done "set_fs()" or not. So PeterZ isn't wrong: > I guess PeterZ assumed that access_ok() is used only with user space access > APIs (e.g. copy_from_user) which can cause page-fault and locks mm (and might > sleep :)), but now we are trying to use access_ok() with new functions which > disables page-fault and just return -EFAULT. .. but in this case, if we do it all *within* code that saves and restores the user access flag with get_fs/set_fs, access_ok() would be ok and it doesn't have the above issue. So access_ok() in _general_ is absolutely not safe to do from interrupts, but within the context of probing user memory from a tracing event it just happens to be ok. It would be lovely to have a special macro for this, and keep the warning for the general case, but because this is a "every architecture needs to build their own" it's probably too painful. PeterZ, do you remember the particular use case that triggered that commit 7c4788950ba5 ("x86/uaccess, sched/preempt: Verify access_ok() context")? Linus ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-24 17:26 ` Linus Torvalds @ 2019-02-25 2:40 ` Masami Hiramatsu 2019-02-25 4:49 ` Andy Lutomirski 2019-02-25 8:33 ` Peter Zijlstra 1 sibling, 1 reply; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-25 2:40 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Steven Rostedt, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Peter Zijlstra On Sun, 24 Feb 2019 09:26:45 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sun, Feb 24, 2019 at 7:18 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > On Sat, 23 Feb 2019 20:38:03 -0800 > > Andy Lutomirski <luto@kernel.org> wrote: > > > > > > Can we just get rid of this might_sleep()? access_ok() doesn't sleep > > > as far as I know. > > > > Hmm, which might_sleep() would you pointed? What I talked was a > > WARN_ON_ONCE(!in_task()) in access_ok() on x86 (only!), and in_task() just > > checks preempt_count. > > So the in_task() check does kind of make sense. Using "access_ok()" > outside of task context is certainly an odd thing, for several > reasons. The main one being simply that outside of task context, the > whole "which task" question is open, and you don't know if the task is > the active one, and so it's not clear if whatever task you interrupt > might have done "set_fs()" or not. Ah I got it. Usual case access_ok() in IRQ handler is strange. > > So PeterZ isn't wrong: > > > I guess PeterZ assumed that access_ok() is used only with user space access > > APIs (e.g. copy_from_user) which can cause page-fault and locks mm (and might > > sleep :)), but now we are trying to use access_ok() with new functions which > > disables page-fault and just return -EFAULT. > > .. but in this case, if we do it all *within* code that saves and > restores the user access flag with get_fs/set_fs, access_ok() would be > ok and it doesn't have the above issue. > > So access_ok() in _general_ is absolutely not safe to do from > interrupts, but within the context of probing user memory from a > tracing event it just happens to be ok. Hmm, but user can specify user-memory access from the tracing event which is located in interrupt handler. So I understand that it is safe only if we correctly setup access flag with get_fs/set_fs, is that correct? > It would be lovely to have a special macro for this, and keep the > warning for the general case, but because this is a "every > architecture needs to build their own" it's probably too painful. Agreed. > > PeterZ, do you remember the particular use case that triggered that > commit 7c4788950ba5 ("x86/uaccess, sched/preempt: Verify access_ok() > context")? > > Linus Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-25 2:40 ` Masami Hiramatsu @ 2019-02-25 4:49 ` Andy Lutomirski 2019-02-25 8:09 ` Masami Hiramatsu 0 siblings, 1 reply; 85+ messages in thread From: Andy Lutomirski @ 2019-02-25 4:49 UTC (permalink / raw) To: Masami Hiramatsu Cc: Linus Torvalds, Andy Lutomirski, Steven Rostedt, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Peter Zijlstra On Sun, Feb 24, 2019 at 6:40 PM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > On Sun, 24 Feb 2019 09:26:45 -0800 > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Sun, Feb 24, 2019 at 7:18 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > > On Sat, 23 Feb 2019 20:38:03 -0800 > > > Andy Lutomirski <luto@kernel.org> wrote: > > > > > > > > Can we just get rid of this might_sleep()? access_ok() doesn't sleep > > > > as far as I know. > > > > > > Hmm, which might_sleep() would you pointed? What I talked was a > > > WARN_ON_ONCE(!in_task()) in access_ok() on x86 (only!), and in_task() just > > > checks preempt_count. > > > > So the in_task() check does kind of make sense. Using "access_ok()" > > outside of task context is certainly an odd thing, for several > > reasons. The main one being simply that outside of task context, the > > whole "which task" question is open, and you don't know if the task is > > the active one, and so it's not clear if whatever task you interrupt > > might have done "set_fs()" or not. > > Ah I got it. Usual case access_ok() in IRQ handler is strange. > > > > > So PeterZ isn't wrong: > > > > > I guess PeterZ assumed that access_ok() is used only with user space access > > > APIs (e.g. copy_from_user) which can cause page-fault and locks mm (and might > > > sleep :)), but now we are trying to use access_ok() with new functions which > > > disables page-fault and just return -EFAULT. > > > > .. but in this case, if we do it all *within* code that saves and > > restores the user access flag with get_fs/set_fs, access_ok() would be > > ok and it doesn't have the above issue. > > > > So access_ok() in _general_ is absolutely not safe to do from > > interrupts, but within the context of probing user memory from a > > tracing event it just happens to be ok. > > Hmm, but user can specify user-memory access from the tracing event > which is located in interrupt handler. So I understand that it is safe > only if we correctly setup access flag with get_fs/set_fs, is that > correct? > > > It would be lovely to have a special macro for this, and keep the > > warning for the general case, but because this is a "every > > architecture needs to build their own" it's probably too painful. > > Agreed. This should probably go with whatever effort makes nmi_uaccess_ok() available on all architectures. That being said, how about just making copy_from_user_nmi() work on all architectures, even if it just fails unconditionally on some of them? ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-25 4:49 ` Andy Lutomirski @ 2019-02-25 8:09 ` Masami Hiramatsu 2019-02-25 16:40 ` Steven Rostedt 0 siblings, 1 reply; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-25 8:09 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Steven Rostedt, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Peter Zijlstra On Sun, 24 Feb 2019 20:49:45 -0800 Andy Lutomirski <luto@kernel.org> wrote: > On Sun, Feb 24, 2019 at 6:40 PM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > On Sun, 24 Feb 2019 09:26:45 -0800 > > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > On Sun, Feb 24, 2019 at 7:18 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > > > > On Sat, 23 Feb 2019 20:38:03 -0800 > > > > Andy Lutomirski <luto@kernel.org> wrote: > > > > > > > > > > Can we just get rid of this might_sleep()? access_ok() doesn't sleep > > > > > as far as I know. > > > > > > > > Hmm, which might_sleep() would you pointed? What I talked was a > > > > WARN_ON_ONCE(!in_task()) in access_ok() on x86 (only!), and in_task() just > > > > checks preempt_count. > > > > > > So the in_task() check does kind of make sense. Using "access_ok()" > > > outside of task context is certainly an odd thing, for several > > > reasons. The main one being simply that outside of task context, the > > > whole "which task" question is open, and you don't know if the task is > > > the active one, and so it's not clear if whatever task you interrupt > > > might have done "set_fs()" or not. > > > > Ah I got it. Usual case access_ok() in IRQ handler is strange. > > > > > > > > So PeterZ isn't wrong: > > > > > > > I guess PeterZ assumed that access_ok() is used only with user space access > > > > APIs (e.g. copy_from_user) which can cause page-fault and locks mm (and might > > > > sleep :)), but now we are trying to use access_ok() with new functions which > > > > disables page-fault and just return -EFAULT. > > > > > > .. but in this case, if we do it all *within* code that saves and > > > restores the user access flag with get_fs/set_fs, access_ok() would be > > > ok and it doesn't have the above issue. > > > > > > So access_ok() in _general_ is absolutely not safe to do from > > > interrupts, but within the context of probing user memory from a > > > tracing event it just happens to be ok. > > > > Hmm, but user can specify user-memory access from the tracing event > > which is located in interrupt handler. So I understand that it is safe > > only if we correctly setup access flag with get_fs/set_fs, is that > > correct? > > > > > It would be lovely to have a special macro for this, and keep the > > > warning for the general case, but because this is a "every > > > architecture needs to build their own" it's probably too painful. > > > > Agreed. > > This should probably go with whatever effort makes nmi_uaccess_ok() > available on all architectures. That being said, how about just > making copy_from_user_nmi() work on all architectures, even if it just > fails unconditionally on some of them? I think even if we have copy_from_user_nmi(), we need something like nmi_uaccess_ok() because without it we can not correctly use __copy_from_user_inatomic()... Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-25 8:09 ` Masami Hiramatsu @ 2019-02-25 16:40 ` Steven Rostedt 2019-02-26 1:35 ` Masami Hiramatsu 0 siblings, 1 reply; 85+ messages in thread From: Steven Rostedt @ 2019-02-25 16:40 UTC (permalink / raw) To: Masami Hiramatsu Cc: Andy Lutomirski, Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Peter Zijlstra On Mon, 25 Feb 2019 17:09:45 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > > This should probably go with whatever effort makes nmi_uaccess_ok() > > available on all architectures. That being said, how about just > > making copy_from_user_nmi() work on all architectures, even if it just > > fails unconditionally on some of them? > > I think even if we have copy_from_user_nmi(), we need something like > nmi_uaccess_ok() because without it we can not correctly use > __copy_from_user_inatomic()... But wouldn't that just be part of the implementation of "copy_from_user_nmi()" as being in an NMI just assumes being inatomic? -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-25 16:40 ` Steven Rostedt @ 2019-02-26 1:35 ` Masami Hiramatsu 0 siblings, 0 replies; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-26 1:35 UTC (permalink / raw) To: Steven Rostedt Cc: Andy Lutomirski, Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook, Peter Zijlstra On Mon, 25 Feb 2019 11:40:18 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Mon, 25 Feb 2019 17:09:45 +0900 > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > This should probably go with whatever effort makes nmi_uaccess_ok() > > > available on all architectures. That being said, how about just > > > making copy_from_user_nmi() work on all architectures, even if it just > > > fails unconditionally on some of them? > > > > I think even if we have copy_from_user_nmi(), we need something like > > nmi_uaccess_ok() because without it we can not correctly use > > __copy_from_user_inatomic()... > > But wouldn't that just be part of the implementation of > "copy_from_user_nmi()" as being in an NMI just assumes being inatomic? Yes for copy_from_user_nmi(). But there are some other fundamental functions, like __get_user(). And when we optimize the loop in strncpy/strnlen from user in atomic, I think one nmi_access_ok() at entry is enough. Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-24 17:26 ` Linus Torvalds 2019-02-25 2:40 ` Masami Hiramatsu @ 2019-02-25 8:33 ` Peter Zijlstra 2019-02-25 14:52 ` Peter Zijlstra 1 sibling, 1 reply; 85+ messages in thread From: Peter Zijlstra @ 2019-02-25 8:33 UTC (permalink / raw) To: Linus Torvalds Cc: Masami Hiramatsu, Andy Lutomirski, Steven Rostedt, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook On Sun, Feb 24, 2019 at 09:26:45AM -0800, Linus Torvalds wrote: > PeterZ, do you remember the particular use case that triggered that > commit 7c4788950ba5 ("x86/uaccess, sched/preempt: Verify access_ok() > context")? This one, if I'm not mistaken. --- commit ae31fe51a3cceaa0cabdb3058f69669ecb47f12e Author: Johannes Weiner <hannes@cmpxchg.org> Date: Tue Nov 22 10:57:42 2016 +0100 perf/x86: Restore TASK_SIZE check on frame pointer The following commit: 75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses") ... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual access_ok() check. Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE, whereas the access_ok() uses whatever the current address limit of the task is. We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and then see vmalloc faults when we access what looks like pointers into vmalloc space: [] WARNING: CPU: 3 PID: 3685731 at arch/x86/mm/fault.c:435 vmalloc_fault+0x289/0x290 [] CPU: 3 PID: 3685731 Comm: sh Tainted: G W 4.6.0-5_fbk1_223_gdbf0f40 #1 [] Call Trace: [] <NMI> [<ffffffff814717d1>] dump_stack+0x4d/0x6c [] [<ffffffff81076e43>] __warn+0xd3/0xf0 [] [<ffffffff81076f2d>] warn_slowpath_null+0x1d/0x20 [] [<ffffffff8104a899>] vmalloc_fault+0x289/0x290 [] [<ffffffff8104b5a0>] __do_page_fault+0x330/0x490 [] [<ffffffff8104b70c>] do_page_fault+0xc/0x10 [] [<ffffffff81794e82>] page_fault+0x22/0x30 [] [<ffffffff81006280>] ? perf_callchain_user+0x100/0x2a0 [] [<ffffffff8115124f>] get_perf_callchain+0x17f/0x190 [] [<ffffffff811512c7>] perf_callchain+0x67/0x80 [] [<ffffffff8114e750>] perf_prepare_sample+0x2a0/0x370 [] [<ffffffff8114e840>] perf_event_output+0x20/0x60 [] [<ffffffff8114aee7>] ? perf_event_update_userpage+0xc7/0x130 [] [<ffffffff8114ea01>] __perf_event_overflow+0x181/0x1d0 [] [<ffffffff8114f484>] perf_event_overflow+0x14/0x20 [] [<ffffffff8100a6e3>] intel_pmu_handle_irq+0x1d3/0x490 [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] [<ffffffff81197191>] ? vunmap_page_range+0x1a1/0x2f0 [] [<ffffffff811972f1>] ? unmap_kernel_range_noflush+0x11/0x20 [] [<ffffffff814f2056>] ? ghes_copy_tofrom_phys+0x116/0x1f0 [] [<ffffffff81040d1d>] ? x2apic_send_IPI_self+0x1d/0x20 [] [<ffffffff8100411d>] perf_event_nmi_handler+0x2d/0x50 [] [<ffffffff8101ea31>] nmi_handle+0x61/0x110 [] [<ffffffff8101ef94>] default_do_nmi+0x44/0x110 [] [<ffffffff8101f13b>] do_nmi+0xdb/0x150 [] [<ffffffff81795187>] end_repeat_nmi+0x1a/0x1e [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10 [] <<EOE>> <IRQ> [<ffffffff8115d05e>] ? __probe_kernel_read+0x3e/0xa0 Fix this by moving the valid_user_frame() check to before the uaccess that loads the return address and the pointer to the next frame. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Fixes: 75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses") Signed-off-by: Ingo Molnar <mingo@kernel.org> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index d31735f37ed7..9d4bf3ab049e 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2352,7 +2352,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent frame.next_frame = 0; frame.return_address = 0; - if (!access_ok(VERIFY_READ, fp, 8)) + if (!valid_user_frame(fp, sizeof(frame))) break; bytes = __copy_from_user_nmi(&frame.next_frame, fp, 4); @@ -2362,9 +2362,6 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent if (bytes != 0) break; - if (!valid_user_frame(fp, sizeof(frame))) - break; - perf_callchain_store(entry, cs_base + frame.return_address); fp = compat_ptr(ss_base + frame.next_frame); } @@ -2413,7 +2410,7 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs frame.next_frame = NULL; frame.return_address = 0; - if (!access_ok(VERIFY_READ, fp, sizeof(*fp) * 2)) + if (!valid_user_frame(fp, sizeof(frame))) break; bytes = __copy_from_user_nmi(&frame.next_frame, fp, sizeof(*fp)); @@ -2423,9 +2420,6 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs if (bytes != 0) break; - if (!valid_user_frame(fp, sizeof(frame))) - break; - perf_callchain_store(entry, frame.return_address); fp = (void __user *)frame.next_frame; } ^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-25 8:33 ` Peter Zijlstra @ 2019-02-25 14:52 ` Peter Zijlstra 0 siblings, 0 replies; 85+ messages in thread From: Peter Zijlstra @ 2019-02-25 14:52 UTC (permalink / raw) To: Linus Torvalds Cc: Masami Hiramatsu, Andy Lutomirski, Steven Rostedt, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn, Kees Cook On Mon, Feb 25, 2019 at 09:33:09AM +0100, Peter Zijlstra wrote: > On Sun, Feb 24, 2019 at 09:26:45AM -0800, Linus Torvalds wrote: > > PeterZ, do you remember the particular use case that triggered that > > commit 7c4788950ba5 ("x86/uaccess, sched/preempt: Verify access_ok() > > context")? > > This one, if I'm not mistaken. > > --- > > commit ae31fe51a3cceaa0cabdb3058f69669ecb47f12e > Author: Johannes Weiner <hannes@cmpxchg.org> > Date: Tue Nov 22 10:57:42 2016 +0100 > > perf/x86: Restore TASK_SIZE check on frame pointer > > The following commit: > > 75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses") > > ... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual > access_ok() check. > > Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE, > whereas the access_ok() uses whatever the current address limit of the task is. > > We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and > then see vmalloc faults when we access what looks like pointers into vmalloc > space: Also note that this was before we did: commit 88b0193d9418c00340e45e0a913a0813bc6c8c96 Author: Will Deacon <will.deacon@arm.com> Date: Tue May 9 18:00:04 2017 +0100 perf/callchain: Force USER_DS when invoking perf_callchain_user() Perf can generate and record a user callchain in response to a synchronous request, such as a tracepoint firing. If this happens under set_fs(KERNEL_DS), then we can end up walking the user stack (and dereferencing/saving whatever we find there) without the protections usually afforded by checks such as access_ok. Rather than play whack-a-mole with each architecture's stack unwinding implementation, fix the root of the problem by ensuring that we force USER_DS when invoking perf_callchain_user from the perf core. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c index c04917cad1bf..1b2be63c8528 100644 --- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -229,12 +229,18 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user, } if (regs) { + mm_segment_t fs; + if (crosstask) goto exit_put; if (add_mark) perf_callchain_store_context(&ctx, PERF_CONTEXT_USER); + + fs = get_fs(); + set_fs(USER_DS); perf_callchain_user(&ctx, regs); + set_fs(fs); } } ^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-24 4:38 ` Andy Lutomirski 2019-02-24 15:17 ` Masami Hiramatsu @ 2019-02-25 16:48 ` Kees Cook 2019-02-25 16:58 ` Andy Lutomirski 1 sibling, 1 reply; 85+ messages in thread From: Kees Cook @ 2019-02-25 16:48 UTC (permalink / raw) To: Andy Lutomirski Cc: Steven Rostedt, Masami Hiramatsu, Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn On Sat, Feb 23, 2019 at 8:38 PM Andy Lutomirski <luto@kernel.org> wrote: > > On Sat, Feb 23, 2019 at 4:44 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Sat, 23 Feb 2019 12:47:46 +0900 > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > Since kprobes handler runs in IRQ context, we can not use access_ok() in it. > > > (only on x86 + CONFIG_DEBUG_ATOMIC_SLEEP=y) > > > > Is it really IRQ context or exception context? That is, one > > (interrupts) happen for any task, but exceptions happen because of the > > software that is executed (like a breakpoint). Although you can have a > > kprobe trigger in an interrupt handler (where user access wouldn't make > > sense anyway). But there should be no problem with user access from an > > exception handler. > > > > Can we just get rid of this might_sleep()? access_ok() doesn't sleep > as far as I know. We do need to be aware of the userfaultfd case of getting held by userspace in the middle of a copy_*_user()... that's a whole other problem. -- Kees Cook ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-25 16:48 ` Kees Cook @ 2019-02-25 16:58 ` Andy Lutomirski 2019-02-25 17:07 ` Kees Cook 0 siblings, 1 reply; 85+ messages in thread From: Andy Lutomirski @ 2019-02-25 16:58 UTC (permalink / raw) To: Kees Cook Cc: Andy Lutomirski, Steven Rostedt, Masami Hiramatsu, Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn On Mon, Feb 25, 2019 at 8:48 AM Kees Cook <keescook@chromium.org> wrote: > > On Sat, Feb 23, 2019 at 8:38 PM Andy Lutomirski <luto@kernel.org> wrote: > > > > On Sat, Feb 23, 2019 at 4:44 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > On Sat, 23 Feb 2019 12:47:46 +0900 > > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > > > Since kprobes handler runs in IRQ context, we can not use access_ok() in it. > > > > (only on x86 + CONFIG_DEBUG_ATOMIC_SLEEP=y) > > > > > > Is it really IRQ context or exception context? That is, one > > > (interrupts) happen for any task, but exceptions happen because of the > > > software that is executed (like a breakpoint). Although you can have a > > > kprobe trigger in an interrupt handler (where user access wouldn't make > > > sense anyway). But there should be no problem with user access from an > > > exception handler. > > > > > > > Can we just get rid of this might_sleep()? access_ok() doesn't sleep > > as far as I know. > > We do need to be aware of the userfaultfd case of getting held by > userspace in the middle of a copy_*_user()... that's a whole other > problem. > I sure hope that pagefault_disable() already takes care of this. Otherwise we have major problems already. ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-25 16:58 ` Andy Lutomirski @ 2019-02-25 17:07 ` Kees Cook 0 siblings, 0 replies; 85+ messages in thread From: Kees Cook @ 2019-02-25 17:07 UTC (permalink / raw) To: Andy Lutomirski Cc: Steven Rostedt, Masami Hiramatsu, Linus Torvalds, Linux List Kernel Mailing, Ingo Molnar, Andrew Morton, stable, Changbin Du, Jann Horn On Mon, Feb 25, 2019 at 8:58 AM Andy Lutomirski <luto@kernel.org> wrote: > I sure hope that pagefault_disable() already takes care of this. > Otherwise we have major problems already. Okay, cool. I missed that bit. :) -- Kees Cook ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-15 17:47 ` [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault Steven Rostedt 2019-02-15 17:55 ` Linus Torvalds @ 2019-02-21 7:52 ` Masami Hiramatsu 2019-02-21 14:36 ` Steven Rostedt 2019-02-23 14:48 ` Masami Hiramatsu 1 sibling, 2 replies; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-21 7:52 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, stable, Changbin Du On Fri, 15 Feb 2019 12:47:13 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > From: Changbin Du <changbin.du@gmail.com> > > The userspace can ask kprobe to intercept strings at any memory address, > including invalid kernel address. In this case, fetch_store_strlen() > would crash since it uses general usercopy function, and user access > functions are no longer allowed to access kernel memory. > > For example, we can crash the kernel by doing something as below: > > $ sudo kprobe 'p:do_sys_open +0(+0(%si)):string' > > [ 103.620391] BUG: GPF in non-whitelisted uaccess (non-canonical address?) > [ 103.622104] general protection fault: 0000 [#1] SMP PTI > [ 103.623424] CPU: 10 PID: 1046 Comm: cat Not tainted 5.0.0-rc3-00130-gd73aba1-dirty #96 > [ 103.625321] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-2-g628b2e6-dirty-20190104_103505-linux 04/01/2014 > [ 103.628284] RIP: 0010:process_fetch_insn+0x1ab/0x4b0 > [ 103.629518] Code: 10 83 80 28 2e 00 00 01 31 d2 31 ff 48 8b 74 24 28 eb 0c 81 fa ff 0f 00 00 7f 1c 85 c0 75 18 66 66 90 0f ae e8 48 63 > ca 89 f8 <8a> 0c 31 66 66 90 83 c2 01 84 c9 75 dc 89 54 24 34 89 44 24 28 48 > [ 103.634032] RSP: 0018:ffff88845eb37ce0 EFLAGS: 00010246 > [ 103.635312] RAX: 0000000000000000 RBX: ffff888456c4e5a8 RCX: 0000000000000000 > [ 103.637057] RDX: 0000000000000000 RSI: 2e646c2f6374652f RDI: 0000000000000000 > [ 103.638795] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [ 103.640556] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 > [ 103.642297] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 103.644040] FS: 0000000000000000(0000) GS:ffff88846f000000(0000) knlGS:0000000000000000 > [ 103.646019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 103.647436] CR2: 00007ffc79758038 CR3: 0000000463360006 CR4: 0000000000020ee0 > [ 103.649147] Call Trace: > [ 103.649781] ? sched_clock_cpu+0xc/0xa0 > [ 103.650747] ? do_sys_open+0x5/0x220 > [ 103.651635] kprobe_trace_func+0x303/0x380 > [ 103.652645] ? do_sys_open+0x5/0x220 > [ 103.653528] kprobe_dispatcher+0x45/0x50 > [ 103.654682] ? do_sys_open+0x1/0x220 > [ 103.655875] kprobe_ftrace_handler+0x90/0xf0 > [ 103.657282] ftrace_ops_assist_func+0x54/0xf0 > [ 103.658564] ? __call_rcu+0x1dc/0x280 > [ 103.659482] 0xffffffffc00000bf > [ 103.660384] ? __ia32_sys_open+0x20/0x20 > [ 103.661682] ? do_sys_open+0x1/0x220 > [ 103.662863] do_sys_open+0x5/0x220 > [ 103.663988] do_syscall_64+0x60/0x210 > [ 103.665201] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 103.666862] RIP: 0033:0x7fc22fadccdd > [ 103.668034] Code: 48 89 54 24 e0 41 83 e2 40 75 32 89 f0 25 00 00 41 00 3d 00 00 41 00 74 24 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff > ff 0f 05 <48> 3d 00 f0 ff ff 77 33 f3 c3 66 0f 1f 84 00 00 00 00 00 48 8d 44 > [ 103.674029] RSP: 002b:00007ffc7972c3a8 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 > [ 103.676512] RAX: ffffffffffffffda RBX: 0000562f86147a21 RCX: 00007fc22fadccdd > [ 103.678853] RDX: 0000000000080000 RSI: 00007fc22fae1428 RDI: 00000000ffffff9c > [ 103.681151] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000 > [ 103.683489] R10: 0000000000000000 R11: 0000000000000287 R12: 00007fc22fce90a8 > [ 103.685774] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 > [ 103.688056] Modules linked in: > [ 103.689131] ---[ end trace 43792035c28984a1 ]--- > > This can be fixed by using probe_mem_read() instead, as it can handle faulting > kernel memory addresses, which kprobes can legitimately do. Basically OK to me. Could you use probe_kernel_read() in this context, since probe_mem_read() is a wrapper function for template code. With that change, Acked-by: Masami Hiramatsu <mhiramat@kernel.org> And for the long term, I need to find more efficient (or smarter) way to do it, like strnlen_user() does. Thank you, > > Link: http://lkml.kernel.org/r/20190125151051.7381-1-changbin.du@gmail.com > > Cc: stable@vger.kernel.org > Fixes: 9da3f2b7405 ("x86/fault: BUG() when uaccess helpers fault on kernel addresses") > Signed-off-by: Changbin Du <changbin.du@gmail.com> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> > --- > kernel/trace/trace_kprobe.c | 10 +--------- > 1 file changed, 1 insertion(+), 9 deletions(-) > > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c > index d5fb09ebba8b..9eaf07f99212 100644 > --- a/kernel/trace/trace_kprobe.c > +++ b/kernel/trace/trace_kprobe.c > @@ -861,22 +861,14 @@ static const struct file_operations kprobe_profile_ops = { > static nokprobe_inline int > fetch_store_strlen(unsigned long addr) > { > - mm_segment_t old_fs; > int ret, len = 0; > u8 c; > > - old_fs = get_fs(); > - set_fs(KERNEL_DS); > - pagefault_disable(); > - > do { > - ret = __copy_from_user_inatomic(&c, (u8 *)addr + len, 1); > + ret = probe_mem_read(&c, (u8 *)addr + len, 1); > len++; > } while (c && ret == 0 && len < MAX_STRING_SIZE); > > - pagefault_enable(); > - set_fs(old_fs); > - > return (ret < 0) ? ret : len; > } > > -- > 2.20.1 > > -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-21 7:52 ` Masami Hiramatsu @ 2019-02-21 14:36 ` Steven Rostedt 2019-02-21 15:58 ` Masami Hiramatsu 2019-02-23 14:48 ` Masami Hiramatsu 1 sibling, 1 reply; 85+ messages in thread From: Steven Rostedt @ 2019-02-21 14:36 UTC (permalink / raw) To: Masami Hiramatsu Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, stable, Changbin Du On Thu, 21 Feb 2019 16:52:52 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > Basically OK to me. > Could you use probe_kernel_read() in this context, since probe_mem_read() is a > wrapper function for template code. > > With that change, > > Acked-by: Masami Hiramatsu <mhiramat@kernel.org> This already hit Linus's tree. I was able to reproduce the crash, so I streamlined it. I should have still pushed more for your ack first. Sorry about that. For some reason, I thought the change was in the generic probe code, and accepted the probe_mem_read(). Anyway, did you want to send a patch to change it to probe_kernel_read(), for the merge window? > > And for the long term, I need to find more efficient (or smarter) way to do it, > like strnlen_user() does. Agreed. Thanks, -- Steve ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-21 14:36 ` Steven Rostedt @ 2019-02-21 15:58 ` Masami Hiramatsu 2019-02-21 16:16 ` Masami Hiramatsu 0 siblings, 1 reply; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-21 15:58 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, stable, Changbin Du On Thu, 21 Feb 2019 09:36:25 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Thu, 21 Feb 2019 16:52:52 +0900 > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > Basically OK to me. > > Could you use probe_kernel_read() in this context, since probe_mem_read() is a > > wrapper function for template code. > > > > With that change, > > > > Acked-by: Masami Hiramatsu <mhiramat@kernel.org> > > This already hit Linus's tree. I was able to reproduce the crash, so I > streamlined it. I should have still pushed more for your ack first. > Sorry about that. Oh, never mind. That seems urgent issue for kprobe event. Thank you very much for fixing it! > For some reason, I thought the change was in the generic probe code, > and accepted the probe_mem_read(). Anyway, did you want to send a patch > to change it to probe_kernel_read(), for the merge window? No problem. > > And for the long term, I need to find more efficient (or smarter) way to do it, > > like strnlen_user() does. > > Agreed. Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-21 15:58 ` Masami Hiramatsu @ 2019-02-21 16:16 ` Masami Hiramatsu 2019-02-21 16:32 ` Steven Rostedt 0 siblings, 1 reply; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-21 16:16 UTC (permalink / raw) To: Masami Hiramatsu Cc: Steven Rostedt, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, stable, Changbin Du Hi Steve, On Fri, 22 Feb 2019 00:58:12 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > On Thu, 21 Feb 2019 09:36:25 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Thu, 21 Feb 2019 16:52:52 +0900 > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > Basically OK to me. > > > Could you use probe_kernel_read() in this context, since probe_mem_read() is a > > > wrapper function for template code. > > > > > > With that change, > > > > > > Acked-by: Masami Hiramatsu <mhiramat@kernel.org> > > > > This already hit Linus's tree. I was able to reproduce the crash, so I > > streamlined it. I should have still pushed more for your ack first. > > Sorry about that. > > Oh, never mind. That seems urgent issue for kprobe event. > Thank you very much for fixing it! > > > For some reason, I thought the change was in the generic probe code, > > and accepted the probe_mem_read(). Anyway, did you want to send a patch > > to change it to probe_kernel_read(), for the merge window? > > No problem. Oops, I mean No, not yet. but it is a simple and cosmetic patch like below. Feel free to merge it to ftrace/core. --------- tracing/kprobes: Use probe_kernel_read instead of probe_mem_read From: Masami Hiramatsu <mhiramat@kernel.org> Use probe_kernel_read() instead of probe_mem_read() because probe_mem_read() is a kind of wrapper for switching memory read function between uprobes and kprobes. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- kernel/trace/trace_kprobe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 9eaf07f99212..99592c27465e 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -865,7 +865,7 @@ fetch_store_strlen(unsigned long addr) u8 c; do { - ret = probe_mem_read(&c, (u8 *)addr + len, 1); + ret = probe_kernel_read(&c, (u8 *)addr + len, 1); len++; } while (c && ret == 0 && len < MAX_STRING_SIZE); -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-21 16:16 ` Masami Hiramatsu @ 2019-02-21 16:32 ` Steven Rostedt 0 siblings, 0 replies; 85+ messages in thread From: Steven Rostedt @ 2019-02-21 16:32 UTC (permalink / raw) To: Masami Hiramatsu Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, stable, Changbin Du [ Linus, you may want to read this ] On Fri, 22 Feb 2019 01:16:43 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: Yep, I'll take this patch. Hmm, my for-next isn't based on my urgent branch, and this would go on top of urgent. I can do one of three things to push this to Linus for the merge window: 1) Create a separate branch to add updates for on my urgent branch. This would have me do two pull requests to Linus. 2) Cherry pick the urgent commit that this updates. But that has a stable tag, which could confuse things as it would create the same commit twice, both with stable tags (I could take the stable tag off of the cherry pick though). 3) Merge the urgent branch into my for-next branch at that commit, with a message to why I'm doing this. The urgent branch is based off of a older tag in Linus's tree, so that would only pull in tracing changes (my stuff), and wont pull in anyone else's work. I may go with option 3, because I believe this may be one of the cases that is allowed to have merges in pull requests. a) Only my stuff got merged b) Have a dependency on something that already went into Linus's tree (prevent me from having to rebase already tested work). c) Have it documented in the merge commit to why its being merged Unless Linus feels otherwise (use one of the other options?), I may go ahead and do that. -- Steve > --------- > tracing/kprobes: Use probe_kernel_read instead of probe_mem_read > > From: Masami Hiramatsu <mhiramat@kernel.org> > > Use probe_kernel_read() instead of probe_mem_read() because > probe_mem_read() is a kind of wrapper for switching memory > read function between uprobes and kprobes. > > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> > --- > kernel/trace/trace_kprobe.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c > index 9eaf07f99212..99592c27465e 100644 > --- a/kernel/trace/trace_kprobe.c > +++ b/kernel/trace/trace_kprobe.c > @@ -865,7 +865,7 @@ fetch_store_strlen(unsigned long addr) > u8 c; > > do { > - ret = probe_mem_read(&c, (u8 *)addr + len, 1); > + ret = probe_kernel_read(&c, (u8 *)addr + len, 1); > len++; > } while (c && ret == 0 && len < MAX_STRING_SIZE); > > ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault 2019-02-21 7:52 ` Masami Hiramatsu 2019-02-21 14:36 ` Steven Rostedt @ 2019-02-23 14:48 ` Masami Hiramatsu 1 sibling, 0 replies; 85+ messages in thread From: Masami Hiramatsu @ 2019-02-23 14:48 UTC (permalink / raw) To: Masami Hiramatsu Cc: Steven Rostedt, linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton, stable, Changbin Du Hi, On Thu, 21 Feb 2019 16:52:52 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > > --- > > kernel/trace/trace_kprobe.c | 10 +--------- > > 1 file changed, 1 insertion(+), 9 deletions(-) > > > > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c > > index d5fb09ebba8b..9eaf07f99212 100644 > > --- a/kernel/trace/trace_kprobe.c > > +++ b/kernel/trace/trace_kprobe.c > > @@ -861,22 +861,14 @@ static const struct file_operations kprobe_profile_ops = { > > static nokprobe_inline int > > fetch_store_strlen(unsigned long addr) > > { > > - mm_segment_t old_fs; > > int ret, len = 0; > > u8 c; > > > > - old_fs = get_fs(); > > - set_fs(KERNEL_DS); > > - pagefault_disable(); > > - BTW, compared with probe_kernel_read() implementation, this function lacks current->kernel_uaccess_faults_ok modification here. I would like to know whether we can avoid this issue if we tweak this flag. Thank you, > > do { > > - ret = __copy_from_user_inatomic(&c, (u8 *)addr + len, 1); > > + ret = probe_mem_read(&c, (u8 *)addr + len, 1); > > len++; > > } while (c && ret == 0 && len < MAX_STRING_SIZE); > > > > - pagefault_enable(); > > - set_fs(old_fs); > > - > > return (ret < 0) ? ret : len; > > } > > > > -- > > 2.20.1 > > > > > > > -- > Masami Hiramatsu <mhiramat@kernel.org> -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 85+ messages in thread
* [PATCH 2/2 v2] tracing: Fix number of entries in trace header 2019-02-15 17:47 [PATCH 0/2 v2] [GIT PULL (take two)] tracing: Two more fixes Steven Rostedt 2019-02-15 17:47 ` [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault Steven Rostedt @ 2019-02-15 17:47 ` Steven Rostedt 1 sibling, 0 replies; 85+ messages in thread From: Steven Rostedt @ 2019-02-15 17:47 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Joel Fernandes, stable, Quentin Perret From: Quentin Perret <quentin.perret@arm.com> The following commit 441dae8f2f29 ("tracing: Add support for display of tgid in trace output") removed the call to print_event_info() from print_func_help_header_irq() which results in the ftrace header not reporting the number of entries written in the buffer. As this wasn't the original intent of the patch, re-introduce the call to print_event_info() to restore the orginal behaviour. Link: http://lkml.kernel.org/r/20190214152950.4179-1-quentin.perret@arm.com Acked-by: Joel Fernandes <joelaf@google.com> Cc: stable@vger.kernel.org Fixes: 441dae8f2f29 ("tracing: Add support for display of tgid in trace output") Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> --- kernel/trace/trace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c521b7347482..c4238b441624 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3384,6 +3384,8 @@ static void print_func_help_header_irq(struct trace_buffer *buf, struct seq_file const char tgid_space[] = " "; const char space[] = " "; + print_event_info(buf, m); + seq_printf(m, "# %s _-----=> irqs-off\n", tgid ? tgid_space : space); seq_printf(m, "# %s / _----=> need-resched\n", -- 2.20.1 ^ permalink raw reply related [flat|nested] 85+ messages in thread
end of thread, other threads:[~2019-02-28 15:18 UTC | newest] Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-15 17:47 [PATCH 0/2 v2] [GIT PULL (take two)] tracing: Two more fixes Steven Rostedt 2019-02-15 17:47 ` [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault Steven Rostedt 2019-02-15 17:55 ` Linus Torvalds 2019-02-15 22:15 ` Steven Rostedt 2019-02-15 23:49 ` Andy Lutomirski 2019-02-16 0:19 ` Steven Rostedt 2019-02-16 1:32 ` Andy Lutomirski 2019-02-16 2:08 ` Steven Rostedt 2019-02-16 2:14 ` Andy Lutomirski 2019-02-16 2:21 ` Steven Rostedt 2019-02-18 17:58 ` Linus Torvalds 2019-02-18 18:23 ` Linus Torvalds 2019-02-19 16:18 ` Steven Rostedt 2019-02-19 18:43 ` Linus Torvalds 2019-02-19 19:03 ` Steven Rostedt 2019-02-20 8:10 ` Masami Hiramatsu 2019-02-20 13:57 ` Jann Horn 2019-02-20 14:47 ` Steven Rostedt 2019-02-20 15:08 ` Masami Hiramatsu 2019-02-20 14:49 ` Steven Rostedt 2019-02-20 16:04 ` Masami Hiramatsu 2019-02-20 16:42 ` Steven Rostedt 2019-02-21 7:37 ` Masami Hiramatsu 2019-02-22 8:27 ` Masami Hiramatsu 2019-02-22 8:35 ` Masami Hiramatsu 2019-02-22 17:43 ` Linus Torvalds 2019-02-22 17:48 ` Andy Lutomirski 2019-02-22 18:28 ` Linus Torvalds 2019-02-22 19:52 ` Andy Lutomirski 2019-02-22 19:27 ` Alexei Starovoitov 2019-02-22 19:30 ` Steven Rostedt 2019-02-22 19:34 ` Alexei Starovoitov 2019-02-22 19:39 ` Steven Rostedt 2019-02-22 19:55 ` Andy Lutomirski 2019-02-22 21:43 ` Jann Horn 2019-02-22 22:08 ` Nadav Amit 2019-02-22 22:17 ` Jann Horn 2019-02-22 22:21 ` Nadav Amit 2019-02-22 22:39 ` Nadav Amit 2019-02-22 23:02 ` Jann Horn 2019-02-22 23:22 ` Nadav Amit 2019-02-22 23:59 ` Andy Lutomirski 2019-02-23 0:03 ` Alexei Starovoitov 2019-02-23 0:15 ` Nadav Amit 2019-02-24 19:35 ` Andy Lutomirski 2019-02-25 13:36 ` Masami Hiramatsu 2019-02-22 21:20 ` Linus Torvalds 2019-02-22 21:38 ` David Miller 2019-02-22 21:59 ` Linus Torvalds 2019-02-22 22:51 ` Alexei Starovoitov 2019-02-22 23:11 ` Jann Horn 2019-02-22 23:16 ` David Miller 2019-02-22 23:16 ` Linus Torvalds 2019-02-22 23:56 ` Alexei Starovoitov 2019-02-23 0:08 ` Linus Torvalds 2019-02-23 2:28 ` Alexei Starovoitov 2019-02-23 2:32 ` Linus Torvalds 2019-02-23 3:02 ` Steven Rostedt 2019-02-23 4:51 ` Masami Hiramatsu 2019-02-26 3:57 ` Christoph Hellwig 2019-02-26 15:24 ` Joel Fernandes 2019-02-28 12:29 ` Masami Hiramatsu 2019-02-28 15:18 ` Joel Fernandes 2019-02-23 3:47 ` Masami Hiramatsu 2019-02-24 0:44 ` Steven Rostedt 2019-02-24 4:38 ` Andy Lutomirski 2019-02-24 15:17 ` Masami Hiramatsu 2019-02-24 17:26 ` Linus Torvalds 2019-02-25 2:40 ` Masami Hiramatsu 2019-02-25 4:49 ` Andy Lutomirski 2019-02-25 8:09 ` Masami Hiramatsu 2019-02-25 16:40 ` Steven Rostedt 2019-02-26 1:35 ` Masami Hiramatsu 2019-02-25 8:33 ` Peter Zijlstra 2019-02-25 14:52 ` Peter Zijlstra 2019-02-25 16:48 ` Kees Cook 2019-02-25 16:58 ` Andy Lutomirski 2019-02-25 17:07 ` Kees Cook 2019-02-21 7:52 ` Masami Hiramatsu 2019-02-21 14:36 ` Steven Rostedt 2019-02-21 15:58 ` Masami Hiramatsu 2019-02-21 16:16 ` Masami Hiramatsu 2019-02-21 16:32 ` Steven Rostedt 2019-02-23 14:48 ` Masami Hiramatsu 2019-02-15 17:47 ` [PATCH 2/2 v2] tracing: Fix number of entries in trace header Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).