All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu <mhiramat@kernel.org>
To: Steven Rostedt <rostedt@goodmis.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Ingo Molnar <mingo@kernel.org>
Cc: X86 ML <x86@kernel.org>, Masami Hiramatsu <mhiramat@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	kuba@kernel.org, mingo@redhat.com, ast@kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	kernel-team@fb.com, yhs@fb.com, linux-ia64@vger.kernel.org,
	Abhishek Sagar <sagar.abhishek@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>
Subject: [RFC PATCH 0/1] Non stack-intrusive return probe event
Date: Sun, 29 Aug 2021 23:22:14 +0900	[thread overview]
Message-ID: <163024693462.457128.1437820221831758047.stgit@devnote2> (raw)
In-Reply-To: <162756755600.301564.4957591913842010341.stgit@devnote2>

Hello,

For a long time, we tackled to fix some issues around kretprobe.
One of the latest action was the stacktrace fix on x86 in this
thread.

https://lore.kernel.org/bpf/162756755600.301564.4957591913842010341.stgit@devnote2/

However, there seems no progress/further discussion. So I would
like to make another approach for this (and the other issues.)

Here is my idea -- replace kretprobe with kprobe.
In other words, put a kprobe on the "return instruction" directly
instead of modifying the kernel stack. This can solve most
of the kretprobe disadvantges. E.g.

- Since it doesn't change the kernel stack, any special stack
  unwinder fixup is not needed anymore.
- No "max-instance" limitations anymore, because it will use
  kprobes directly.
- Scalability performance will be improved as same as kprobes.
  No list-operation in probe-runtime.

Here is a PoC code which introduces "retinsn_probe" event as a part
of ftrace kprobe event. I don't think we need to replace the
kretprobe. This should be a higher layer feature, because some
kernel functions can have multiple "return instructions". Thus,
the "retinsn_probe" must manage multiple kprobes. That means the
"retinsn_probe" will be a user of kprobes. I decided to make it
inside the ftrace "kprobe-event". This gives us another advantage
for eBPF support. Because eBPF uses "kprobe-event" instead of
"kprobe" directly, if the "retinsn_probe" is implemented in the
"kprobe-event", eBPF can use it without any change.
Anyway, this can be co-exist with kretprobe. So as far as any
user uses kretprobe, we can keep it.


Example
=======
For example, I ran a shell script, which was used in the
stacktrace fix series.

----
mount -t debugfs debugfs /sys/kernel/debug/
cd /sys/kernel/debug/tracing
echo > trace
echo 1 > options/sym-offset
echo r vfs_read >> kprobe_events
echo r full_proxy_read >> kprobe_events
echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
echo 1 > events/kprobes/enable
cat /sys/kernel/debug/kprobes/list
echo 0 > events/kprobes/enable
cat trace
----

This is the result.
----
ffffffff813b420e  k  full_proxy_read+0x6e    
ffffffff812b7c0a  k  vfs_read+0xda  
# tracer: nop
#
# entries-in-buffer/entries-written: 3/3   #P:8
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| /     delay
#           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
#              | |         |   ||||      |         |
             cat-136     [007] d.Z.     8.038381: r_full_proxy_read_0: (vfs_read+0x9b/0x180 <- full_proxy_read)
             cat-136     [007] d.Z.     8.038386: <stack trace>
 => kretprobe_trace_func+0x209/0x300
 => retinsn_dispatcher+0x7a/0xa0
 => kprobe_post_process+0x28/0x80
 => kprobe_int3_handler+0x166/0x1a0
 => exc_int3+0x47/0x140
 => asm_exc_int3+0x31/0x40
 => vfs_read+0x9b/0x180
 => ksys_read+0x68/0xe0
 => do_syscall_64+0x3b/0x90
 => entry_SYSCALL_64_after_hwframe+0x44/0xae
             cat-136     [007] d.Z.     8.038387: r_vfs_read_0: (ksys_read+0x68/0xe0 <- vfs_read)
----

You can see the return probe events are translated to kprobes
instead of kretprobes. And also, on the stacktrace, we can see
an int3 calls the kprobe and decode stacktrace correctly.


TODO
====
Of course, this is just an PoC code, there are many TODOs.

- This PoC code only supports x86 at this moment. But I think this
  can be done on the other architectures. What it needs is
  to implement "find_return_instructions()".
- Code cleanup is not enough. I have to remove "kretprobe" from
 "trace_kprobe" data structure, rewrite related functions etc.
- It has to handle "tail-call" optimized code, which replaces
  a "call + return" into "jump". find_return_instruction() should
  detect it and decode the jump destination too.


Thank you,


---

Masami Hiramatsu (1):
      [PoC] tracing: kprobe: Add non-stack intrusion return probe event


 arch/x86/kernel/kprobes/core.c |   59 +++++++++++++++++++++
 kernel/trace/trace_kprobe.c    |  110 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 164 insertions(+), 5 deletions(-)

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

WARNING: multiple messages have this Message-ID (diff)
From: Masami Hiramatsu <mhiramat@kernel.org>
To: Steven Rostedt <rostedt@goodmis.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Ingo Molnar <mingo@kernel.org>
Cc: X86 ML <x86@kernel.org>, Masami Hiramatsu <mhiramat@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	kuba@kernel.org, mingo@redhat.com, ast@kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	kernel-team@fb.com, yhs@fb.com, linux-ia64@vger.kernel.org,
	Abhishek Sagar <sagar.abhishek@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>
Subject: [RFC PATCH 0/1] Non stack-intrusive return probe event
Date: Sun, 29 Aug 2021 14:22:14 +0000	[thread overview]
Message-ID: <163024693462.457128.1437820221831758047.stgit@devnote2> (raw)
In-Reply-To: <162756755600.301564.4957591913842010341.stgit@devnote2>

Hello,

For a long time, we tackled to fix some issues around kretprobe.
One of the latest action was the stacktrace fix on x86 in this
thread.

https://lore.kernel.org/bpf/162756755600.301564.4957591913842010341.stgit@devnote2/

However, there seems no progress/further discussion. So I would
like to make another approach for this (and the other issues.)

Here is my idea -- replace kretprobe with kprobe.
In other words, put a kprobe on the "return instruction" directly
instead of modifying the kernel stack. This can solve most
of the kretprobe disadvantges. E.g.

- Since it doesn't change the kernel stack, any special stack
  unwinder fixup is not needed anymore.
- No "max-instance" limitations anymore, because it will use
  kprobes directly.
- Scalability performance will be improved as same as kprobes.
  No list-operation in probe-runtime.

Here is a PoC code which introduces "retinsn_probe" event as a part
of ftrace kprobe event. I don't think we need to replace the
kretprobe. This should be a higher layer feature, because some
kernel functions can have multiple "return instructions". Thus,
the "retinsn_probe" must manage multiple kprobes. That means the
"retinsn_probe" will be a user of kprobes. I decided to make it
inside the ftrace "kprobe-event". This gives us another advantage
for eBPF support. Because eBPF uses "kprobe-event" instead of
"kprobe" directly, if the "retinsn_probe" is implemented in the
"kprobe-event", eBPF can use it without any change.
Anyway, this can be co-exist with kretprobe. So as far as any
user uses kretprobe, we can keep it.


Example
===For example, I ran a shell script, which was used in the
stacktrace fix series.

----
mount -t debugfs debugfs /sys/kernel/debug/
cd /sys/kernel/debug/tracing
echo > trace
echo 1 > options/sym-offset
echo r vfs_read >> kprobe_events
echo r full_proxy_read >> kprobe_events
echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
echo 1 > events/kprobes/enable
cat /sys/kernel/debug/kprobes/list
echo 0 > events/kprobes/enable
cat trace
----

This is the result.
----
ffffffff813b420e  k  full_proxy_read+0x6e    
ffffffff812b7c0a  k  vfs_read+0xda  
# tracer: nop
#
# entries-in-buffer/entries-written: 3/3   #P:8
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| /     delay
#           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
#              | |         |   ||||      |         |
             cat-136     [007] d.Z.     8.038381: r_full_proxy_read_0: (vfs_read+0x9b/0x180 <- full_proxy_read)
             cat-136     [007] d.Z.     8.038386: <stack trace>
 => kretprobe_trace_func+0x209/0x300
 => retinsn_dispatcher+0x7a/0xa0
 => kprobe_post_process+0x28/0x80
 => kprobe_int3_handler+0x166/0x1a0
 => exc_int3+0x47/0x140
 => asm_exc_int3+0x31/0x40
 => vfs_read+0x9b/0x180
 => ksys_read+0x68/0xe0
 => do_syscall_64+0x3b/0x90
 => entry_SYSCALL_64_after_hwframe+0x44/0xae
             cat-136     [007] d.Z.     8.038387: r_vfs_read_0: (ksys_read+0x68/0xe0 <- vfs_read)
----

You can see the return probe events are translated to kprobes
instead of kretprobes. And also, on the stacktrace, we can see
an int3 calls the kprobe and decode stacktrace correctly.


TODO
==
Of course, this is just an PoC code, there are many TODOs.

- This PoC code only supports x86 at this moment. But I think this
  can be done on the other architectures. What it needs is
  to implement "find_return_instructions()".
- Code cleanup is not enough. I have to remove "kretprobe" from
 "trace_kprobe" data structure, rewrite related functions etc.
- It has to handle "tail-call" optimized code, which replaces
  a "call + return" into "jump". find_return_instruction() should
  detect it and decode the jump destination too.


Thank you,


---

Masami Hiramatsu (1):
      [PoC] tracing: kprobe: Add non-stack intrusion return probe event


 arch/x86/kernel/kprobes/core.c |   59 +++++++++++++++++++++
 kernel/trace/trace_kprobe.c    |  110 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 164 insertions(+), 5 deletions(-)

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

  parent reply	other threads:[~2021-08-29 14:22 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 14:05 [PATCH -tip v10 00/16] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
2021-07-29 14:05 ` Masami Hiramatsu
2021-07-29 14:06 ` [PATCH -tip v10 01/16] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
2021-07-29 14:06   ` Masami Hiramatsu
2021-07-29 14:06 ` [PATCH -tip v10 02/16] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor() Masami Hiramatsu
2021-07-29 14:06   ` [PATCH -tip v10 02/16] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_d Masami Hiramatsu
2021-07-29 14:06 ` [PATCH -tip v10 03/16] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
2021-07-29 14:06   ` [PATCH -tip v10 03/16] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handle Masami Hiramatsu
2021-07-29 14:06 ` [PATCH -tip v10 04/16] kprobes: treewide: Make it harder to refer kretprobe_trampoline directly Masami Hiramatsu
2021-07-29 14:06   ` Masami Hiramatsu
2021-07-29 14:06 ` [PATCH -tip v10 05/16] kprobes: Add kretprobe_find_ret_addr() for searching return address Masami Hiramatsu
2021-07-29 14:06   ` Masami Hiramatsu
2021-07-29 14:06 ` [PATCH -tip v10 06/16] objtool: Add frame-pointer-specific function ignore Masami Hiramatsu
2021-07-29 14:06   ` Masami Hiramatsu
2021-07-29 14:07 ` [PATCH -tip v10 07/16] objtool: Ignore unwind hints for ignored functions Masami Hiramatsu
2021-07-29 14:07   ` Masami Hiramatsu
2021-07-29 14:07 ` [PATCH -tip v10 08/16] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline() Masami Hiramatsu
2021-07-29 14:07   ` Masami Hiramatsu
2021-07-29 14:07 ` [PATCH -tip v10 09/16] ARC: Add instruction_pointer_set() API Masami Hiramatsu
2021-07-29 14:07   ` Masami Hiramatsu
2021-07-29 14:07 ` [PATCH -tip v10 10/16] ia64: " Masami Hiramatsu
2021-07-29 14:07   ` Masami Hiramatsu
2021-07-29 14:07 ` [PATCH -tip v10 11/16] arm: kprobes: Make space for instruction pointer on stack Masami Hiramatsu
2021-07-29 14:07   ` Masami Hiramatsu
2021-07-29 14:07 ` [PATCH -tip v10 12/16] kprobes: Enable stacktrace from pt_regs in kretprobe handler Masami Hiramatsu
2021-07-29 14:07   ` Masami Hiramatsu
2021-07-29 14:07 ` [PATCH -tip v10 13/16] x86/kprobes: Push a fake return address at kretprobe_trampoline Masami Hiramatsu
2021-07-29 14:07   ` Masami Hiramatsu
2021-07-29 14:08 ` [PATCH -tip v10 14/16] x86/unwind: Recover kretprobe trampoline entry Masami Hiramatsu
2021-07-29 14:08   ` Masami Hiramatsu
2021-07-29 14:08 ` [PATCH -tip v10 15/16] tracing: Show kretprobe unknown indicator only for kretprobe_trampoline Masami Hiramatsu
2021-07-29 14:08   ` Masami Hiramatsu
2021-07-29 14:08 ` [PATCH -tip v10 16/16] x86/kprobes: Fixup return address in generic trampoline handler Masami Hiramatsu
2021-07-29 14:08   ` Masami Hiramatsu
2021-07-29 23:35 ` [PATCH -tip v10 00/16] kprobes: Fix stacktrace with kretprobes on x86 Masami Hiramatsu
2021-07-29 23:35   ` Masami Hiramatsu
2021-08-24  5:12   ` Andrii Nakryiko
2021-08-24  5:12     ` Andrii Nakryiko
2021-08-24  5:32     ` Masami Hiramatsu
2021-08-24  5:32       ` Masami Hiramatsu
2021-09-13 17:14       ` Andrii Nakryiko
2021-09-13 17:14         ` Andrii Nakryiko
2021-09-14  0:38         ` Masami Hiramatsu
2021-09-14  0:38           ` Masami Hiramatsu
2021-09-14  1:36           ` Andrii Nakryiko
2021-09-14  1:36             ` Andrii Nakryiko
2021-09-14  5:10             ` Masami Hiramatsu
2021-09-14  5:10               ` Masami Hiramatsu
2021-08-29 14:22 ` Masami Hiramatsu [this message]
2021-08-29 14:22   ` [RFC PATCH 0/1] Non stack-intrusive return probe event Masami Hiramatsu
2021-08-29 14:22   ` [RFC PATCH 1/1] [PoC] tracing: kprobe: Add non-stack intrusion " Masami Hiramatsu
2021-08-29 14:22     ` Masami Hiramatsu
2021-08-30 19:04   ` [RFC PATCH 0/1] Non stack-intrusive " Andrii Nakryiko
2021-08-30 19:04     ` Andrii Nakryiko
2021-08-31  6:06     ` Masami Hiramatsu
2021-08-31  6:06       ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=163024693462.457128.1437820221831758047.stgit@devnote2 \
    --to=mhiramat@kernel.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=dxu@dxuuu.xyz \
    --cc=jpoimboe@redhat.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sagar.abhishek@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.