From: Nadav Amit <namit@vmware.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Changbin Du <changbin.du@gmail.com>,
Kees Cook <keescook@chromium.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Network Development <netdev@vger.kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
Rick Edgecombe <rick.p.edgecombe@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Igor Stoppa <igor.stoppa@gmail.com>
Subject: Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault
Date: Sat, 23 Feb 2019 00:15:56 +0000 [thread overview]
Message-ID: <7E4E5362-4C13-40FA-B5A8-F88D3E60827F@vmware.com> (raw)
In-Reply-To: <CALCETrU2V5KBA97g3O-yHiUu1acmM_K9b2a5ATmKSqGwdd-+Dw@mail.gmail.com>
> On Feb 22, 2019, at 3:59 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> On Fri, Feb 22, 2019 at 3:02 PM Jann Horn <jannh@google.com> wrote:
>> On Fri, Feb 22, 2019 at 11:39 PM Nadav Amit <namit@vmware.com> wrote:
>>>> On Feb 22, 2019, at 2:21 PM, Nadav Amit <namit@vmware.com> wrote:
>>>>
>>>>> On Feb 22, 2019, at 2:17 PM, Jann Horn <jannh@google.com> wrote:
>>>>>
>>>>> On Fri, Feb 22, 2019 at 11:08 PM Nadav Amit <namit@vmware.com> wrote:
>>>>>>> On Feb 22, 2019, at 1:43 PM, Jann Horn <jannh@google.com> wrote:
>>>>>>>
>>>>>>> (adding some people from the text_poke series to the thread, removing stable@)
>>>>>>>
>>>>>>> On Fri, Feb 22, 2019 at 8:55 PM Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>>>>> On Feb 22, 2019, at 11:34 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>>>>>>>> On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote:
>>>>>>>>>> On Fri, 22 Feb 2019 11:27:05 -0800
>>>>>>>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Then we should still probably fix up "__probe_kernel_read()" to not
>>>>>>>>>>>> allow user accesses. The easiest way to do that is actually likely to
>>>>>>>>>>>> use the "unsafe_get_user()" functions *without* doing a
>>>>>>>>>>>> uaccess_begin(), which will mean that modern CPU's will simply fault
>>>>>>>>>>>> on a kernel access to user space.
>>>>>>>>>>>
>>>>>>>>>>> On bpf side the bpf_probe_read() helper just calls probe_kernel_read()
>>>>>>>>>>> and users pass both user and kernel addresses into it and expect
>>>>>>>>>>> that the helper will actually try to read from that address.
>>>>>>>>>>>
>>>>>>>>>>> If __probe_kernel_read will suddenly start failing on all user addresses
>>>>>>>>>>> it will break the expectations.
>>>>>>>>>>> How do we solve it in bpf_probe_read?
>>>>>>>>>>> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte
>>>>>>>>>>> in the loop?
>>>>>>>>>>> That's doable, but people already complain that bpf_probe_read() is slow
>>>>>>>>>>> and shows up in their perf report.
>>>>>>>>>>
>>>>>>>>>> We're changing kprobes to add a specific flag to say that we want to
>>>>>>>>>> differentiate between kernel or user reads. Can this be done with
>>>>>>>>>> bpf_probe_read()? If it's showing up in perf report, I doubt a single
>>>>>>>>>
>>>>>>>>> so you're saying you will break existing kprobe scripts?
>>>>>>>>> I don't think it's a good idea.
>>>>>>>>> It's not acceptable to break bpf_probe_read uapi.
>>>>>>>>
>>>>>>>> If so, the uapi is wrong: a long-sized number does not reliably identify an address if you don’t separately know whether it’s a user or kernel address. s390x and 4G:4G x86_32 are the notable exceptions. I have lobbied for RISC-V and future x86_64 to join the crowd. I don’t know whether I’ll win this fight, but the uapi will probably have to change for at least s390x.
>>>>>>>>
>>>>>>>> What to do about existing scripts is a different question.
>>>>>>>
>>>>>>> This lack of logical separation between user and kernel addresses
>>>>>>> might interact interestingly with the text_poke series, specifically
>>>>>>> "[PATCH v3 05/20] x86/alternative: Initialize temporary mm for
>>>>>>> patching" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-6-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cbab53e52cc5c4ac4419008d69921d1f1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864767879731941&sdata=2tqD7udTCfNbcNLcj5SFpZt8WwK5NwtgaWMKm1Ye1EE%3D&reserved=0)
>>>>>>> and "[PATCH v3 06/20] x86/alternative: Use temporary mm for text
>>>>>>> poking" (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20190221234451.17632-7-rick.p.edgecombe%40intel.com%2F&data=02%7C01%7Cnamit%40vmware.com%7Cbab53e52cc5c4ac4419008d69921d1f1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636864767879731941&sdata=7%2BLShgLKnra6xzSkxdJrCclCacfdE5IdczwScW83nuE%3D&reserved=0),
>>>>>>> right? If someone manages to get a tracing BPF program to trigger in a
>>>>>>> task that has switched to the patching mm, could they use
>>>>>>> bpf_probe_write_user() - which uses probe_kernel_write() after
>>>>>>> checking that KERNEL_DS isn't active and that access_ok() passes - to
>>>>>>> overwrite kernel text that is mapped writable in the patching mm?
>>>>>>
>>>>>> Yes, this is a good point. I guess text_poke() should be defined with
>>>>>> “__kprobes” and open-code memcpy.
>>>>>>
>>>>>> Does it sound reasonable?
>>>>>
>>>>> Doesn't __text_poke() as implemented in the proposed patch use a
>>>>> couple other kernel functions, too? Like switch_mm_irqs_off() and
>>>>> pte_clear() (which can be a call into a separate function on paravirt
>>>>> kernels)?
>>>>
>>>> I will move the pte_clear() to be done after the poking mm was unloaded.
>>>> Give me a few minutes to send a sketch of what I think should be done.
>>>
>>> Err.. You are right, I don’t see an easy way of preventing a kprobe from
>>> being set on switch_mm_irqs_off(), and open-coding this monster is too ugly.
>>>
>>> The reasonable solution seems to me as taking all the relevant pieces of
>>> code (and data) that might be used during text-poking and encapsulating them, so they
>>> will be set in a memory area which cannot be kprobe'd. This can also be
>>> useful to write-protect data structures of code that calls text_poke(),
>>> e.g., static-keys. It can also protect data on that stack that is used
>>> during text_poke() from being overwritten from another core.
>>>
>>> This solution is somewhat similar to Igor Stoppa’s idea of using “enclaves”
>>> when doing write-rarely operations.
>>>
>>> Right now, I think that text_poke() will keep being susceptible to such
>>> an attack, unless you have a better suggestion.
>>
>> A relatively simple approach might be to teach BPF not to run kprobe
>> programs and such in contexts where current->mm isn't the active mm?
>> Maybe using nmi_uaccess_okay(), or something like that? It looks like
>> perf_callchain_user() also already uses that. Except that a lot of
>> this code is x86-specific...
>
> This sounds like exactly the right solution. If you're running from
> some unknown context (like NMI or tracing), then you should check
> nmi_uaccess_okay(). I think we should just promote that to be a
> non-arch-specific function (that returns true by default) and check it
> the relevant bpf_probe_xyz() functions.
I can do that, but notice that switch_mm_irqs_off() writes to
cpu_tlbstate.loaded_mm before it actually writes to CR3. So there are still
a couple of instructions (and the load_new_mm_cr3()) in between that a
kprobe can be set on, no?
I can mark them as non-kprobable.
next prev parent reply other threads:[~2019-02-23 0:16 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-15 17:47 [PATCH 0/2 v2] [GIT PULL (take two)] tracing: Two more fixes Steven Rostedt
2019-02-15 17:47 ` [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault Steven Rostedt
2019-02-15 17:55 ` Linus Torvalds
2019-02-15 22:15 ` Steven Rostedt
2019-02-15 23:49 ` Andy Lutomirski
2019-02-16 0:19 ` Steven Rostedt
2019-02-16 1:32 ` Andy Lutomirski
2019-02-16 2:08 ` Steven Rostedt
2019-02-16 2:14 ` Andy Lutomirski
2019-02-16 2:21 ` Steven Rostedt
2019-02-18 17:58 ` Linus Torvalds
2019-02-18 18:23 ` Linus Torvalds
2019-02-19 16:18 ` Steven Rostedt
2019-02-19 18:43 ` Linus Torvalds
2019-02-19 19:03 ` Steven Rostedt
2019-02-20 8:10 ` Masami Hiramatsu
2019-02-20 13:57 ` Jann Horn
2019-02-20 14:47 ` Steven Rostedt
2019-02-20 15:08 ` Masami Hiramatsu
2019-02-20 14:49 ` Steven Rostedt
2019-02-20 16:04 ` Masami Hiramatsu
2019-02-20 16:42 ` Steven Rostedt
2019-02-21 7:37 ` Masami Hiramatsu
2019-02-22 8:27 ` Masami Hiramatsu
2019-02-22 8:35 ` Masami Hiramatsu
2019-02-22 17:43 ` Linus Torvalds
2019-02-22 17:48 ` Andy Lutomirski
2019-02-22 18:28 ` Linus Torvalds
2019-02-22 19:52 ` Andy Lutomirski
2019-02-22 19:27 ` Alexei Starovoitov
2019-02-22 19:30 ` Steven Rostedt
2019-02-22 19:34 ` Alexei Starovoitov
2019-02-22 19:39 ` Steven Rostedt
2019-02-22 19:55 ` Andy Lutomirski
2019-02-22 21:43 ` Jann Horn
2019-02-22 22:08 ` Nadav Amit
2019-02-22 22:17 ` Jann Horn
2019-02-22 22:21 ` Nadav Amit
2019-02-22 22:39 ` Nadav Amit
2019-02-22 23:02 ` Jann Horn
2019-02-22 23:22 ` Nadav Amit
2019-02-22 23:59 ` Andy Lutomirski
2019-02-23 0:03 ` Alexei Starovoitov
2019-02-23 0:15 ` Nadav Amit [this message]
2019-02-24 19:35 ` Andy Lutomirski
2019-02-25 13:36 ` Masami Hiramatsu
2019-02-22 21:20 ` Linus Torvalds
2019-02-22 21:38 ` David Miller
2019-02-22 21:59 ` Linus Torvalds
2019-02-22 22:51 ` Alexei Starovoitov
2019-02-22 23:11 ` Jann Horn
2019-02-22 23:16 ` David Miller
2019-02-22 23:16 ` Linus Torvalds
2019-02-22 23:56 ` Alexei Starovoitov
2019-02-23 0:08 ` Linus Torvalds
2019-02-23 2:28 ` Alexei Starovoitov
2019-02-23 2:32 ` Linus Torvalds
2019-02-23 3:02 ` Steven Rostedt
2019-02-23 4:51 ` Masami Hiramatsu
2019-02-26 3:57 ` Christoph Hellwig
2019-02-26 15:24 ` Joel Fernandes
2019-02-28 12:29 ` Masami Hiramatsu
2019-02-28 15:18 ` Joel Fernandes
2019-02-23 3:47 ` Masami Hiramatsu
2019-02-24 0:44 ` Steven Rostedt
2019-02-24 4:38 ` Andy Lutomirski
2019-02-24 15:17 ` Masami Hiramatsu
2019-02-24 17:26 ` Linus Torvalds
2019-02-25 2:40 ` Masami Hiramatsu
2019-02-25 4:49 ` Andy Lutomirski
2019-02-25 8:09 ` Masami Hiramatsu
2019-02-25 16:40 ` Steven Rostedt
2019-02-26 1:35 ` Masami Hiramatsu
2019-02-25 8:33 ` Peter Zijlstra
2019-02-25 14:52 ` Peter Zijlstra
2019-02-25 16:48 ` Kees Cook
2019-02-25 16:58 ` Andy Lutomirski
2019-02-25 17:07 ` Kees Cook
2019-02-21 7:52 ` Masami Hiramatsu
2019-02-21 14:36 ` Steven Rostedt
2019-02-21 15:58 ` Masami Hiramatsu
2019-02-21 16:16 ` Masami Hiramatsu
2019-02-21 16:32 ` Steven Rostedt
2019-02-23 14:48 ` Masami Hiramatsu
2019-02-15 17:47 ` [PATCH 2/2 v2] tracing: Fix number of entries in trace header Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7E4E5362-4C13-40FA-B5A8-F88D3E60827F@vmware.com \
--to=namit@vmware.com \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=changbin.du@gmail.com \
--cc=daniel@iogearbox.net \
--cc=dave.hansen@intel.com \
--cc=igor.stoppa@gmail.com \
--cc=jannh@google.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rostedt@goodmis.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).