bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kprobes: fix NULL pointer dereference at kprobe_ftrace_handler
@ 2020-07-28  6:45 Muchun Song
  2020-08-03  5:46 ` Muchun Song
  2020-08-03  6:01 ` Song Liu
  0 siblings, 2 replies; 5+ messages in thread
From: Muchun Song @ 2020-07-28  6:45 UTC (permalink / raw)
  To: naveen.n.rao, anil.s.keshavamurthy, davem, mhiramat, ast, daniel,
	kafai, songliubraving, yhs, andriin, john.fastabend, kpsingh
  Cc: linux-kernel, netdev, bpf, Muchun Song, Chengming Zhou

We found a case of kernel panic on our server. The stack trace is as
follows(omit some irrelevant information):

  BUG: kernel NULL pointer dereference, address: 0000000000000080
  RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0
  RSP: 0018:ffffb512c6550998 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000
  RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0
  RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <IRQ>
   ftrace_ops_assist_func+0x56/0xe0
   ftrace_call+0x5/0x34
   tcpa_statistic_send+0x5/0x130 [ttcp_engine]

The tcpa_statistic_send is the function being kprobed. After analysis,
the root cause is that the fourth parameter regs of kprobe_ftrace_handler
is NULL. Why regs is NULL? We use the crash tool to analyze the kdump.

  crash> dis tcpa_statistic_send -r
         <tcpa_statistic_send>: callq 0xffffffffbd8018c0 <ftrace_caller>

The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller.
So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler
is NULL. In theory, we should call the ftrace_regs_caller instead of the
ftrace_caller. After in-depth analysis, we found a reproducible path.

  Writing a simple kernel module which starts a periodic timer. The
  timer's handler is named 'kprobe_test_timer_handler'. The module
  name is kprobe_test.ko.

  1) insmod kprobe_test.ko
  2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
  3) echo 0 > /proc/sys/kernel/ftrace_enabled
  4) rmmod kprobe_test
  5) stop step 2) kprobe
  6) insmod kprobe_test.ko
  7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'

We mark the kprobe as GONE but not disarm the kprobe in the step 4).
The step 5) also do not disarm the kprobe when unregister kprobe. So
we do not remove the ip from the filter. In this case, when the module
loads again in the step 6), we will replace the code to ftrace_caller
via the ftrace_module_enable(). When we register kprobe again, we will
not replace ftrace_caller to ftrace_regs_caller because the ftrace is
disabled in the step 3). So the step 7) will trigger kernel panic. Fix
this problem by disarming the kprobe when the module is going away.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/kprobes.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 146c648eb943..503add629599 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -2148,6 +2148,13 @@ static void kill_kprobe(struct kprobe *p)
 	 * the original probed function (which will be freed soon) any more.
 	 */
 	arch_remove_kprobe(p);
+
+	/*
+	 * The module is going away. We should disarm the kprobe which
+	 * is using ftrace.
+	 */
+	if (kprobe_ftrace(p))
+		disarm_kprobe_ftrace(p);
 }
 
 /* Disable one kprobe */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] kprobes: fix NULL pointer dereference at kprobe_ftrace_handler
  2020-07-28  6:45 [PATCH] kprobes: fix NULL pointer dereference at kprobe_ftrace_handler Muchun Song
@ 2020-08-03  5:46 ` Muchun Song
  2020-08-03 14:50   ` Masami Hiramatsu
  2020-08-03  6:01 ` Song Liu
  1 sibling, 1 reply; 5+ messages in thread
From: Muchun Song @ 2020-08-03  5:46 UTC (permalink / raw)
  To: naveen.n.rao, anil.s.keshavamurthy, davem, mhiramat, ast, daniel,
	kafai, songliubraving, yhs, andriin, john.fastabend, kpsingh
  Cc: LKML, netdev, bpf, Chengming Zhou

Ping guys. Any comments or suggestions?

On Tue, Jul 28, 2020 at 2:45 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> We found a case of kernel panic on our server. The stack trace is as
> follows(omit some irrelevant information):
>
>   BUG: kernel NULL pointer dereference, address: 0000000000000080
>   RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0
>   RSP: 0018:ffffb512c6550998 EFLAGS: 00010282
>   RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000
>   RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0
>   RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004
>   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>   R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    <IRQ>
>    ftrace_ops_assist_func+0x56/0xe0
>    ftrace_call+0x5/0x34
>    tcpa_statistic_send+0x5/0x130 [ttcp_engine]
>
> The tcpa_statistic_send is the function being kprobed. After analysis,
> the root cause is that the fourth parameter regs of kprobe_ftrace_handler
> is NULL. Why regs is NULL? We use the crash tool to analyze the kdump.
>
>   crash> dis tcpa_statistic_send -r
>          <tcpa_statistic_send>: callq 0xffffffffbd8018c0 <ftrace_caller>
>
> The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller.
> So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler
> is NULL. In theory, we should call the ftrace_regs_caller instead of the
> ftrace_caller. After in-depth analysis, we found a reproducible path.
>
>   Writing a simple kernel module which starts a periodic timer. The
>   timer's handler is named 'kprobe_test_timer_handler'. The module
>   name is kprobe_test.ko.
>
>   1) insmod kprobe_test.ko
>   2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
>   3) echo 0 > /proc/sys/kernel/ftrace_enabled
>   4) rmmod kprobe_test
>   5) stop step 2) kprobe
>   6) insmod kprobe_test.ko
>   7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
>
> We mark the kprobe as GONE but not disarm the kprobe in the step 4).
> The step 5) also do not disarm the kprobe when unregister kprobe. So
> we do not remove the ip from the filter. In this case, when the module
> loads again in the step 6), we will replace the code to ftrace_caller
> via the ftrace_module_enable(). When we register kprobe again, we will
> not replace ftrace_caller to ftrace_regs_caller because the ftrace is
> disabled in the step 3). So the step 7) will trigger kernel panic. Fix
> this problem by disarming the kprobe when the module is going away.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  kernel/kprobes.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 146c648eb943..503add629599 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -2148,6 +2148,13 @@ static void kill_kprobe(struct kprobe *p)
>          * the original probed function (which will be freed soon) any more.
>          */
>         arch_remove_kprobe(p);
> +
> +       /*
> +        * The module is going away. We should disarm the kprobe which
> +        * is using ftrace.
> +        */
> +       if (kprobe_ftrace(p))
> +               disarm_kprobe_ftrace(p);
>  }
>
>  /* Disable one kprobe */
> --
> 2.11.0
>


-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] kprobes: fix NULL pointer dereference at kprobe_ftrace_handler
  2020-07-28  6:45 [PATCH] kprobes: fix NULL pointer dereference at kprobe_ftrace_handler Muchun Song
  2020-08-03  5:46 ` Muchun Song
@ 2020-08-03  6:01 ` Song Liu
  1 sibling, 0 replies; 5+ messages in thread
From: Song Liu @ 2020-08-03  6:01 UTC (permalink / raw)
  To: Muchun Song
  Cc: naveen.n.rao, anil.s.keshavamurthy, davem, mhiramat, ast, daniel,
	Martin Lau, Yonghong Song, Andrii Nakryiko, john.fastabend,
	kpsingh, linux-kernel, netdev, bpf, Chengming Zhou



> On Jul 27, 2020, at 11:45 PM, Muchun Song <songmuchun@bytedance.com> wrote:
> 
> We found a case of kernel panic on our server. The stack trace is as
> follows(omit some irrelevant information):
> 
>  BUG: kernel NULL pointer dereference, address: 0000000000000080
>  RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0
>  RSP: 0018:ffffb512c6550998 EFLAGS: 00010282
>  RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000
>  RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0
>  RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004
>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>  R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   <IRQ>
>   ftrace_ops_assist_func+0x56/0xe0
>   ftrace_call+0x5/0x34
>   tcpa_statistic_send+0x5/0x130 [ttcp_engine]
> 
> The tcpa_statistic_send is the function being kprobed. After analysis,
> the root cause is that the fourth parameter regs of kprobe_ftrace_handler
> is NULL. Why regs is NULL? We use the crash tool to analyze the kdump.
> 
>  crash> dis tcpa_statistic_send -r
>         <tcpa_statistic_send>: callq 0xffffffffbd8018c0 <ftrace_caller>
> 
> The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller.
> So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler
> is NULL. In theory, we should call the ftrace_regs_caller instead of the
> ftrace_caller. After in-depth analysis, we found a reproducible path.
> 
>  Writing a simple kernel module which starts a periodic timer. The
>  timer's handler is named 'kprobe_test_timer_handler'. The module
>  name is kprobe_test.ko.
> 
>  1) insmod kprobe_test.ko
>  2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
>  3) echo 0 > /proc/sys/kernel/ftrace_enabled
>  4) rmmod kprobe_test
>  5) stop step 2) kprobe
>  6) insmod kprobe_test.ko
>  7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
> 
> We mark the kprobe as GONE but not disarm the kprobe in the step 4).
> The step 5) also do not disarm the kprobe when unregister kprobe. So
> we do not remove the ip from the filter. In this case, when the module
> loads again in the step 6), we will replace the code to ftrace_caller
> via the ftrace_module_enable(). When we register kprobe again, we will
> not replace ftrace_caller to ftrace_regs_caller because the ftrace is
> disabled in the step 3). So the step 7) will trigger kernel panic. Fix
> this problem by disarming the kprobe when the module is going away.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

Looks good. 

Acked-by: Song Liu <songliubraving@fb.com>

> ---
> kernel/kprobes.c | 7 +++++++
> 1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 146c648eb943..503add629599 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -2148,6 +2148,13 @@ static void kill_kprobe(struct kprobe *p)
> 	 * the original probed function (which will be freed soon) any more.
> 	 */
> 	arch_remove_kprobe(p);
> +
> +	/*
> +	 * The module is going away. We should disarm the kprobe which
> +	 * is using ftrace.
> +	 */
> +	if (kprobe_ftrace(p))
> +		disarm_kprobe_ftrace(p);
> }
> 
> /* Disable one kprobe */
> -- 
> 2.11.0
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] kprobes: fix NULL pointer dereference at kprobe_ftrace_handler
  2020-08-03  5:46 ` Muchun Song
@ 2020-08-03 14:50   ` Masami Hiramatsu
  2020-08-03 19:58     ` Steven Rostedt
  0 siblings, 1 reply; 5+ messages in thread
From: Masami Hiramatsu @ 2020-08-03 14:50 UTC (permalink / raw)
  To: Muchun Song, Steven Rostedt
  Cc: naveen.n.rao, anil.s.keshavamurthy, davem, ast, daniel, kafai,
	songliubraving, yhs, andriin, john.fastabend, kpsingh, LKML,
	netdev, bpf, Chengming Zhou

On Mon, 3 Aug 2020 13:46:25 +0800
Muchun Song <songmuchun@bytedance.com> wrote:

> Ping guys. Any comments or suggestions?
> 
> On Tue, Jul 28, 2020 at 2:45 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > We found a case of kernel panic on our server. The stack trace is as
> > follows(omit some irrelevant information):
> >
> >   BUG: kernel NULL pointer dereference, address: 0000000000000080
> >   RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0
> >   RSP: 0018:ffffb512c6550998 EFLAGS: 00010282
> >   RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000
> >   RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0
> >   RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004
> >   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> >   R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00
> >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0
> >   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >   Call Trace:
> >    <IRQ>
> >    ftrace_ops_assist_func+0x56/0xe0
> >    ftrace_call+0x5/0x34
> >    tcpa_statistic_send+0x5/0x130 [ttcp_engine]
> >
> > The tcpa_statistic_send is the function being kprobed. After analysis,
> > the root cause is that the fourth parameter regs of kprobe_ftrace_handler
> > is NULL. Why regs is NULL? We use the crash tool to analyze the kdump.
> >
> >   crash> dis tcpa_statistic_send -r
> >          <tcpa_statistic_send>: callq 0xffffffffbd8018c0 <ftrace_caller>
> >
> > The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller.
> > So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler
> > is NULL. In theory, we should call the ftrace_regs_caller instead of the
> > ftrace_caller. After in-depth analysis, we found a reproducible path.
> >
> >   Writing a simple kernel module which starts a periodic timer. The
> >   timer's handler is named 'kprobe_test_timer_handler'. The module
> >   name is kprobe_test.ko.
> >
> >   1) insmod kprobe_test.ko
> >   2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
> >   3) echo 0 > /proc/sys/kernel/ftrace_enabled
> >   4) rmmod kprobe_test
> >   5) stop step 2) kprobe
> >   6) insmod kprobe_test.ko
> >   7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
> >
> > We mark the kprobe as GONE but not disarm the kprobe in the step 4).
> > The step 5) also do not disarm the kprobe when unregister kprobe. So
> > we do not remove the ip from the filter. In this case, when the module
> > loads again in the step 6), we will replace the code to ftrace_caller
> > via the ftrace_module_enable(). When we register kprobe again, we will
> > not replace ftrace_caller to ftrace_regs_caller because the ftrace is
> > disabled in the step 3). So the step 7) will trigger kernel panic. Fix
> > this problem by disarming the kprobe when the module is going away.
> >

Nice catch!

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization")
Cc: stable@vger.kernel.org

Thank you!

> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > ---
> >  kernel/kprobes.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 146c648eb943..503add629599 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -2148,6 +2148,13 @@ static void kill_kprobe(struct kprobe *p)
> >          * the original probed function (which will be freed soon) any more.
> >          */
> >         arch_remove_kprobe(p);
> > +
> > +       /*
> > +        * The module is going away. We should disarm the kprobe which
> > +        * is using ftrace.
> > +        */
> > +       if (kprobe_ftrace(p))
> > +               disarm_kprobe_ftrace(p);
> >  }
> >
> >  /* Disable one kprobe */
> > --
> > 2.11.0
> >
> 
> 
> -- 
> Yours,
> Muchun


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] kprobes: fix NULL pointer dereference at kprobe_ftrace_handler
  2020-08-03 14:50   ` Masami Hiramatsu
@ 2020-08-03 19:58     ` Steven Rostedt
  0 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2020-08-03 19:58 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Muchun Song, naveen.n.rao, anil.s.keshavamurthy, davem, ast,
	daniel, kafai, songliubraving, yhs, andriin, john.fastabend,
	kpsingh, LKML, netdev, bpf, Chengming Zhou

On Mon, 3 Aug 2020 23:50:42 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Nice catch!
> 
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> 
> Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization")
> Cc: stable@vger.kernel.org

Thanks Masami,

I'll add this to my queue for the merge window.

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-03 19:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-28  6:45 [PATCH] kprobes: fix NULL pointer dereference at kprobe_ftrace_handler Muchun Song
2020-08-03  5:46 ` Muchun Song
2020-08-03 14:50   ` Masami Hiramatsu
2020-08-03 19:58     ` Steven Rostedt
2020-08-03  6:01 ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).