linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	stable <stable@vger.kernel.org>,
	Changbin Du <changbin.du@gmail.com>, Jann Horn <jannh@google.com>,
	Kees Cook <keescook@chromium.org>,
	Andrew Lutomirski <luto@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Netdev <netdev@vger.kernel.org>,
	bpf@vger.kernel.org
Subject: Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault
Date: Fri, 22 Feb 2019 18:28:53 -0800	[thread overview]
Message-ID: <20190223022850.nv4hnweueetprbot@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CAHk-=whu_iwv4TQP5Xd62bTfWW-uswVecUbFeiSCuTWnH1xgPg@mail.gmail.com>

On Fri, Feb 22, 2019 at 04:08:59PM -0800, Linus Torvalds wrote:
> On Fri, Feb 22, 2019 at 3:56 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > It will preserve existing bpf_probe_read() behavior on x86.
> 
> ... but that's the worst possible situation.
> 
> It appears that people haven't understood that kernel and user
> addresses are distinct, and may have written programs that are
> fundamentally buggy.
> 
> And we _want_ to make it clear that they are buggy on x86-64, exactly
> because x86-64 is the one that gets the most testing - by far.
> 
> So if x86-64 continues working for buggy programs, then that only
> means that those bugs never get fixed.
> 
> It would be much better to try to get those things fixed, and make the
> x86-64 implementation stricter, exactly so that people end up
> _realizing_ that they can't just think "a pointer is a pointer, and
> the context doesn't matter".
>
> From a pure functional safety standpoint, I thought bpf already knew
> what kind of a pointer it had?

when bpf verifier knows the type of pointer it allows direct access to it.
That's the case for skb, socket, packet data, hash maps, arrays, stack, etc.
Networking progs cannot call bpf_probe_read().
It's available to tracing progs only and their goal is to walk the kernel and
user memory with addresses that cannot be statically verified
at program load time.
We are working on adding type information (BTF) to vmlinux.
Soon we'll be able to tell the type of every kernel function argument.
Right now arg1 and arg2 in a kprobed function are just u64 pt_regs->di, si.
Soon we'll be able to precisely identify their C type.

I completely agree on the direction that x86 is the architecture that
sets an example and users need to learn the difference in pointers.
I only disagree on timing.
Right now users don't have an alternative.
In our repo I counted ~400 calls to bpf_probe_read and about 10 times more
'indirect' calls. What's happening with 'indirect' calls is BCC toolchain
using clang to automatically replace struct_a->field_foo access with
bpf_probe_read(struct_a + offsetof(typeof(struct_a), field_foo));

If we had __user vs __kernel annotation available to clang we could have
taught BCC to replace this '->' dereference with appropriate kernel vs user
helper. Also we need to teach GCC to recognize __user and store into dwarf,
so we can take it further into BTF and verify later.

Also I think disallowing bpf_probe_read() to read user pointer will not
make a desired teaching effect. It will only cause painful debugging to folks
when their progs will stop working. It's better to remove bpf_probe_read()
completely.
imo the process of teaching the users of kernel vs user pointer difference
needs to be gradual.
First we introduce bpf_probe_kernel_read and bpf_probe_user_read and
introduce clang/gcc tooling to catch the mistakes.
Going over this 400+ places and manually grepping kernel sources
for __user keyword is not a great proposal if we want to keep those users.
Once we have this working we can remove bpf_probe_read() altogether.
Rejecting bpf prog at load time is a clear signal that user has to fix it
(instead of changing run-time behavior).
When the verifier gets even smarter it could potentially replace prob_read
with probe_kernel_read and probe_user_read when it has that type info.

imo this kernel release should finish as-is and in the next cycle we can
change probe_kernel_read() to reject user address, have temporary
workaround in bpf_probe_read() with probe_kernel_read+get_user hack,
introduce new bpf helpers, new tooling and eventually remove
buggy bpf_probe_read.


  reply	other threads:[~2019-02-23  2:28 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-15 17:47 [PATCH 0/2 v2] [GIT PULL (take two)] tracing: Two more fixes Steven Rostedt
2019-02-15 17:47 ` [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault Steven Rostedt
2019-02-15 17:55   ` Linus Torvalds
2019-02-15 22:15     ` Steven Rostedt
2019-02-15 23:49       ` Andy Lutomirski
2019-02-16  0:19         ` Steven Rostedt
2019-02-16  1:32           ` Andy Lutomirski
2019-02-16  2:08             ` Steven Rostedt
2019-02-16  2:14               ` Andy Lutomirski
2019-02-16  2:21                 ` Steven Rostedt
2019-02-18 17:58           ` Linus Torvalds
2019-02-18 18:23             ` Linus Torvalds
2019-02-19 16:18               ` Steven Rostedt
2019-02-19 18:43                 ` Linus Torvalds
2019-02-19 19:03                   ` Steven Rostedt
2019-02-20  8:10                     ` Masami Hiramatsu
2019-02-20 13:57                       ` Jann Horn
2019-02-20 14:47                         ` Steven Rostedt
2019-02-20 15:08                         ` Masami Hiramatsu
2019-02-20 14:49                       ` Steven Rostedt
2019-02-20 16:04                         ` Masami Hiramatsu
2019-02-20 16:42                           ` Steven Rostedt
2019-02-21  7:37                             ` Masami Hiramatsu
2019-02-22  8:27                         ` Masami Hiramatsu
2019-02-22  8:35                           ` Masami Hiramatsu
2019-02-22 17:43                             ` Linus Torvalds
2019-02-22 17:48                               ` Andy Lutomirski
2019-02-22 18:28                                 ` Linus Torvalds
2019-02-22 19:52                                   ` Andy Lutomirski
2019-02-22 19:27                               ` Alexei Starovoitov
2019-02-22 19:30                                 ` Steven Rostedt
2019-02-22 19:34                                   ` Alexei Starovoitov
2019-02-22 19:39                                     ` Steven Rostedt
2019-02-22 19:55                                     ` Andy Lutomirski
2019-02-22 21:43                                       ` Jann Horn
2019-02-22 22:08                                         ` Nadav Amit
2019-02-22 22:17                                           ` Jann Horn
2019-02-22 22:21                                             ` Nadav Amit
2019-02-22 22:39                                               ` Nadav Amit
2019-02-22 23:02                                                 ` Jann Horn
2019-02-22 23:22                                                   ` Nadav Amit
2019-02-22 23:59                                                   ` Andy Lutomirski
2019-02-23  0:03                                                     ` Alexei Starovoitov
2019-02-23  0:15                                                     ` Nadav Amit
2019-02-24 19:35                                                       ` Andy Lutomirski
2019-02-25 13:36                                                     ` Masami Hiramatsu
2019-02-22 21:20                                 ` Linus Torvalds
2019-02-22 21:38                                   ` David Miller
2019-02-22 21:59                                     ` Linus Torvalds
2019-02-22 22:51                                       ` Alexei Starovoitov
2019-02-22 23:11                                         ` Jann Horn
2019-02-22 23:16                                           ` David Miller
2019-02-22 23:16                                         ` Linus Torvalds
2019-02-22 23:56                                           ` Alexei Starovoitov
2019-02-23  0:08                                             ` Linus Torvalds
2019-02-23  2:28                                               ` Alexei Starovoitov [this message]
2019-02-23  2:32                                                 ` Linus Torvalds
2019-02-23  3:02                                                 ` Steven Rostedt
2019-02-23  4:51                                             ` Masami Hiramatsu
2019-02-26  3:57                                       ` Christoph Hellwig
2019-02-26 15:24                                 ` Joel Fernandes
2019-02-28 12:29                                   ` Masami Hiramatsu
2019-02-28 15:18                                     ` Joel Fernandes
2019-02-23  3:47                               ` Masami Hiramatsu
2019-02-24  0:44                                 ` Steven Rostedt
2019-02-24  4:38                                   ` Andy Lutomirski
2019-02-24 15:17                                     ` Masami Hiramatsu
2019-02-24 17:26                                       ` Linus Torvalds
2019-02-25  2:40                                         ` Masami Hiramatsu
2019-02-25  4:49                                           ` Andy Lutomirski
2019-02-25  8:09                                             ` Masami Hiramatsu
2019-02-25 16:40                                               ` Steven Rostedt
2019-02-26  1:35                                                 ` Masami Hiramatsu
2019-02-25  8:33                                         ` Peter Zijlstra
2019-02-25 14:52                                           ` Peter Zijlstra
2019-02-25 16:48                                     ` Kees Cook
2019-02-25 16:58                                       ` Andy Lutomirski
2019-02-25 17:07                                         ` Kees Cook
2019-02-21  7:52   ` Masami Hiramatsu
2019-02-21 14:36     ` Steven Rostedt
2019-02-21 15:58       ` Masami Hiramatsu
2019-02-21 16:16         ` Masami Hiramatsu
2019-02-21 16:32           ` Steven Rostedt
2019-02-23 14:48     ` Masami Hiramatsu
2019-02-15 17:47 ` [PATCH 2/2 v2] tracing: Fix number of entries in trace header Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190223022850.nv4hnweueetprbot@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=changbin.du@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).