linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] kprobes: Fix locking in recycle_rp_inst
@ 2019-02-26 16:10 Jiri Olsa
  2019-02-27  8:38 ` Masami Hiramatsu
  0 siblings, 1 reply; 5+ messages in thread
From: Jiri Olsa @ 2019-02-26 16:10 UTC (permalink / raw)
  To: Naveen N. Rao, Anil S Keshavamurthy, David S. Miller, Masami Hiramatsu
  Cc: lkml, Srinivasa D S, Hien Nguyen, David Valin

hi,
David reported a crash related to return kprobes.
The change below tries to explain it and fix it,
but I'm sending it as RFC because I don't follow
kprobes code that much, so I might have missed
something.

thanks,
jirka


---
We can call recycle_rp_inst from both task and irq contexts,
so we should use irqsave/irqrestore locking functions.

I wasn't able to hit this particular lockup, but I found it
while checking on why return probe on _raw_spin_lock locks
the system, reported by David by using bpftrace on simple
script, like:

  kprobe:_raw_spin_lock
  {
      @time[tid] = nsecs;
      @symb[tid] = arg0;
  }

  kretprobe:_raw_spin_lock
      / @time[tid] /
  {
      delete(@time[tid]);
      delete(@symb[tid]);
  }

or by perf tool:

  # perf probe -a _raw_spin_lock:%return
  # perf record -e probe:_raw_spin_lock__return -a

The thing is that the _raw_spin_lock call in recycle_rp_inst,
is the only one that return-probe-code-paths call, and it
triggers another kprobe instance while already processing one
and locks up on kretprobe_table_lock lock:

  #12 [ffff99c337403d28] queued_spin_lock_slowpath at ffffffff9712693b
  #13 [ffff99c337403d28] _raw_spin_lock_irqsave at ffffffff9794c100
  #14 [ffff99c337403d38] pre_handler_kretprobe at ffffffff9719435c
  #15 [ffff99c337403d68] kprobe_ftrace_handler at ffffffff97059f12
  #16 [ffff99c337403d98] ftrace_ops_assist_func at ffffffff971a0421
  #17 [ffff99c337403dd8] handle_edge_irq at ffffffff97139f55
  #18 [ffff99c337403df0] handle_edge_irq at ffffffff97139f55
  #19 [ffff99c337403e58] _raw_spin_lock at ffffffff9794c111
  #20 [ffff99c337403e88] _raw_spin_lock at ffffffff9794c115
  #21 [ffff99c337403ea8] trampoline_handler at ffffffff97058a8f
  #22 [ffff99c337403f00] kretprobe_trampoline at ffffffff970586d5
  #23 [ffff99c337403fb0] handle_irq at ffffffff97027b1f
  #24 [ffff99c337403fc0] do_IRQ at ffffffff97a01bc9
  --- <IRQ stack> ---
  #25 [ffffa5c3c1f9fb38] ret_from_intr at ffffffff97a0098f
      [exception RIP: smp_call_function_many+460]
      RIP: ffffffff9716685c  RSP: ffffa5c3c1f9fbe0  RFLAGS: 00000202
      RAX: 0000000000000005  RBX: ffff99c337421c80  RCX: ffff99c337566260
      RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff99c337421c88
      RBP: ffff99c337421c88   R8: 0000000000000001   R9: ffffffff98352940
      R10: ffff99c33703c910  R11: ffffffff9794c110  R12: ffffffff97055680
      R13: 0000000000000000  R14: 0000000000000001  R15: 0000000000000040
      ORIG_RAX: ffffffffffffffde  CS: 0010  SS: 0018
  #26 [ffffa5c3c1f9fc20] on_each_cpu at ffffffff97166918
  #27 [ffffa5c3c1f9fc40] ftrace_replace_code at ffffffff97055a34
  #28 [ffffa5c3c1f9fc88] ftrace_modify_all_code at ffffffff971a3552
  #29 [ffffa5c3c1f9fca8] arch_ftrace_update_code at ffffffff97055a6c
  #30 [ffffa5c3c1f9fcb0] ftrace_run_update_code at ffffffff971a3683
  #31 [ffffa5c3c1f9fcc0] ftrace_startup at ffffffff971a6638
  #32 [ffffa5c3c1f9fce8] register_ftrace_function at ffffffff971a66a0

When we switch it to raw_spin_lock_irqsave the return probe
on _raw_spin_lock starts working.

Reported-by: David Valin <dvalin@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/kprobes.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index c83e54727131..c82056b354cc 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1154,9 +1154,11 @@ void recycle_rp_inst(struct kretprobe_instance *ri,
 	hlist_del(&ri->hlist);
 	INIT_HLIST_NODE(&ri->hlist);
 	if (likely(rp)) {
-		raw_spin_lock(&rp->lock);
+		unsigned long flags;
+
+		raw_spin_lock_irqsave(&rp->lock, flags);
 		hlist_add_head(&ri->hlist, &rp->free_instances);
-		raw_spin_unlock(&rp->lock);
+		raw_spin_unlock_irqrestore(&rp->lock, flags);
 	} else
 		/* Unregistering */
 		hlist_add_head(&ri->hlist, head);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] kprobes: Fix locking in recycle_rp_inst
  2019-02-26 16:10 [RFC] kprobes: Fix locking in recycle_rp_inst Jiri Olsa
@ 2019-02-27  8:38 ` Masami Hiramatsu
  2019-02-27 13:33   ` Jiri Olsa
  0 siblings, 1 reply; 5+ messages in thread
From: Masami Hiramatsu @ 2019-02-27  8:38 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Naveen N. Rao, Anil S Keshavamurthy, David S. Miller, lkml,
	Srinivasa D S, Hien Nguyen, David Valin, Ingo Molnar

Hi Jiri,

On Tue, 26 Feb 2019 17:10:36 +0100
Jiri Olsa <jolsa@kernel.org> wrote:

> hi,
> David reported a crash related to return kprobes.
> The change below tries to explain it and fix it,
> but I'm sending it as RFC because I don't follow
> kprobes code that much, so I might have missed
> something.

Thank you for the report and patch!

> 
> thanks,
> jirka
> 
> 
> ---
> We can call recycle_rp_inst from both task and irq contexts,
> so we should use irqsave/irqrestore locking functions.
> 
> I wasn't able to hit this particular lockup, but I found it
> while checking on why return probe on _raw_spin_lock locks
> the system, reported by David by using bpftrace on simple
> script, like:
> 
>   kprobe:_raw_spin_lock
>   {
>       @time[tid] = nsecs;
>       @symb[tid] = arg0;
>   }
> 
>   kretprobe:_raw_spin_lock
>       / @time[tid] /
>   {
>       delete(@time[tid]);
>       delete(@symb[tid]);
>   }
> 
> or by perf tool:
> 
>   # perf probe -a _raw_spin_lock:%return
>   # perf record -e probe:_raw_spin_lock__return -a
> 
> The thing is that the _raw_spin_lock call in recycle_rp_inst,
> is the only one that return-probe-code-paths call, and it
> triggers another kprobe instance while already processing one
> and locks up on kretprobe_table_lock lock:
> 
>   #12 [ffff99c337403d28] queued_spin_lock_slowpath at ffffffff9712693b
>   #13 [ffff99c337403d28] _raw_spin_lock_irqsave at ffffffff9794c100
>   #14 [ffff99c337403d38] pre_handler_kretprobe at ffffffff9719435c
>   #15 [ffff99c337403d68] kprobe_ftrace_handler at ffffffff97059f12
>   #16 [ffff99c337403d98] ftrace_ops_assist_func at ffffffff971a0421
>   #17 [ffff99c337403dd8] handle_edge_irq at ffffffff97139f55
>   #18 [ffff99c337403df0] handle_edge_irq at ffffffff97139f55
>   #19 [ffff99c337403e58] _raw_spin_lock at ffffffff9794c111
>   #20 [ffff99c337403e88] _raw_spin_lock at ffffffff9794c115
>   #21 [ffff99c337403ea8] trampoline_handler at ffffffff97058a8f
>   #22 [ffff99c337403f00] kretprobe_trampoline at ffffffff970586d5
>   #23 [ffff99c337403fb0] handle_irq at ffffffff97027b1f
>   #24 [ffff99c337403fc0] do_IRQ at ffffffff97a01bc9
>   --- <IRQ stack> ---
>   #25 [ffffa5c3c1f9fb38] ret_from_intr at ffffffff97a0098f
>       [exception RIP: smp_call_function_many+460]
>       RIP: ffffffff9716685c  RSP: ffffa5c3c1f9fbe0  RFLAGS: 00000202
>       RAX: 0000000000000005  RBX: ffff99c337421c80  RCX: ffff99c337566260
>       RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff99c337421c88
>       RBP: ffff99c337421c88   R8: 0000000000000001   R9: ffffffff98352940
>       R10: ffff99c33703c910  R11: ffffffff9794c110  R12: ffffffff97055680
>       R13: 0000000000000000  R14: 0000000000000001  R15: 0000000000000040
>       ORIG_RAX: ffffffffffffffde  CS: 0010  SS: 0018
>   #26 [ffffa5c3c1f9fc20] on_each_cpu at ffffffff97166918
>   #27 [ffffa5c3c1f9fc40] ftrace_replace_code at ffffffff97055a34
>   #28 [ffffa5c3c1f9fc88] ftrace_modify_all_code at ffffffff971a3552
>   #29 [ffffa5c3c1f9fca8] arch_ftrace_update_code at ffffffff97055a6c
>   #30 [ffffa5c3c1f9fcb0] ftrace_run_update_code at ffffffff971a3683
>   #31 [ffffa5c3c1f9fcc0] ftrace_startup at ffffffff971a6638
>   #32 [ffffa5c3c1f9fce8] register_ftrace_function at ffffffff971a66a0
> 
> When we switch it to raw_spin_lock_irqsave the return probe
> on _raw_spin_lock starts working.

Yes, there can be a race between probes and probe on irq handler.

kretprobe_hash_lock()/kretprobe_hash_unlock() are safe because
those disables irqs. Only recycle_rp_inst() has this problem.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

And this is one of the oldest bug in kprobe.

commit ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking")

introduced the spin_lock(&rp->lock) in recycle_rp_inst() but forgot to disable irqs.
And 

commit c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")

introduced assembly-based trampoline which didn't disable irq.

Could you add Cc:stable to this patch too?

Thank you!!

> 
> Reported-by: David Valin <dvalin@redhat.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  kernel/kprobes.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index c83e54727131..c82056b354cc 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1154,9 +1154,11 @@ void recycle_rp_inst(struct kretprobe_instance *ri,
>  	hlist_del(&ri->hlist);
>  	INIT_HLIST_NODE(&ri->hlist);
>  	if (likely(rp)) {
> -		raw_spin_lock(&rp->lock);
> +		unsigned long flags;
> +
> +		raw_spin_lock_irqsave(&rp->lock, flags);
>  		hlist_add_head(&ri->hlist, &rp->free_instances);
> -		raw_spin_unlock(&rp->lock);
> +		raw_spin_unlock_irqrestore(&rp->lock, flags);
>  	} else
>  		/* Unregistering */
>  		hlist_add_head(&ri->hlist, head);
> -- 
> 2.17.2
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] kprobes: Fix locking in recycle_rp_inst
  2019-02-27  8:38 ` Masami Hiramatsu
@ 2019-02-27 13:33   ` Jiri Olsa
  2019-02-28  6:50     ` Masami Hiramatsu
  0 siblings, 1 reply; 5+ messages in thread
From: Jiri Olsa @ 2019-02-27 13:33 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	lkml, Srinivasa D S, Hien Nguyen, David Valin, Ingo Molnar,
	stable

On Wed, Feb 27, 2019 at 05:38:46PM +0900, Masami Hiramatsu wrote:

SNIP

> > When we switch it to raw_spin_lock_irqsave the return probe
> > on _raw_spin_lock starts working.
> 
> Yes, there can be a race between probes and probe on irq handler.
> 
> kretprobe_hash_lock()/kretprobe_hash_unlock() are safe because
> those disables irqs. Only recycle_rp_inst() has this problem.
> 
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> 
> And this is one of the oldest bug in kprobe.
> 
> commit ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking")
> 
> introduced the spin_lock(&rp->lock) in recycle_rp_inst() but forgot to disable irqs.
> And 
> 
> commit c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")

ok, so I'll add:

Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")

> 
> introduced assembly-based trampoline which didn't disable irq.
> 
> Could you add Cc:stable to this patch too?

sure, attaching patch with updated changelog

thanks,
jirka


---
We can call recycle_rp_inst from both task and irq contexts,
so we should irqsave/irqrestore locking functions.

I wasn't able to hit this particular lockup, but I found it
while checking on why return probe on _raw_spin_lock locks
the system, reported by David by using bpftrace on simple
script, like:

  kprobe:_raw_spin_lock
  {
      @time[tid] = nsecs;
      @symb[tid] = arg0;
  }

  kretprobe:_raw_spin_lock
      / @time[tid] /
  {
      delete(@time[tid]);
      delete(@symb[tid]);
  }

or by perf tool:

  # perf probe -a _raw_spin_lock:%return
  # perf record -e probe:_raw_spin_lock__return -a

The thing is that the _raw_spin_lock call in recycle_rp_inst,
is the only one that return probe code paths call and it will
trigger another kprobe instance while already processing one
and lock up on kretprobe_table_lock lock:

  #12 [ffff99c337403d28] queued_spin_lock_slowpath at ffffffff9712693b
  #13 [ffff99c337403d28] _raw_spin_lock_irqsave at ffffffff9794c100
  #14 [ffff99c337403d38] pre_handler_kretprobe at ffffffff9719435c
  #15 [ffff99c337403d68] kprobe_ftrace_handler at ffffffff97059f12
  #16 [ffff99c337403d98] ftrace_ops_assist_func at ffffffff971a0421
  #17 [ffff99c337403dd8] handle_edge_irq at ffffffff97139f55
  #18 [ffff99c337403df0] handle_edge_irq at ffffffff97139f55
  #19 [ffff99c337403e58] _raw_spin_lock at ffffffff9794c111
  #20 [ffff99c337403e88] _raw_spin_lock at ffffffff9794c115
  #21 [ffff99c337403ea8] trampoline_handler at ffffffff97058a8f
  #22 [ffff99c337403f00] kretprobe_trampoline at ffffffff970586d5
  #23 [ffff99c337403fb0] handle_irq at ffffffff97027b1f
  #24 [ffff99c337403fc0] do_IRQ at ffffffff97a01bc9
  --- <IRQ stack> ---
  #25 [ffffa5c3c1f9fb38] ret_from_intr at ffffffff97a0098f
      [exception RIP: smp_call_function_many+460]
      RIP: ffffffff9716685c  RSP: ffffa5c3c1f9fbe0  RFLAGS: 00000202
      RAX: 0000000000000005  RBX: ffff99c337421c80  RCX: ffff99c337566260
      RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff99c337421c88
      RBP: ffff99c337421c88   R8: 0000000000000001   R9: ffffffff98352940
      R10: ffff99c33703c910  R11: ffffffff9794c110  R12: ffffffff97055680
      R13: 0000000000000000  R14: 0000000000000001  R15: 0000000000000040
      ORIG_RAX: ffffffffffffffde  CS: 0010  SS: 0018
  #26 [ffffa5c3c1f9fc20] on_each_cpu at ffffffff97166918
  #27 [ffffa5c3c1f9fc40] ftrace_replace_code at ffffffff97055a34
  #28 [ffffa5c3c1f9fc88] ftrace_modify_all_code at ffffffff971a3552
  #29 [ffffa5c3c1f9fca8] arch_ftrace_update_code at ffffffff97055a6c
  #30 [ffffa5c3c1f9fcb0] ftrace_run_update_code at ffffffff971a3683
  #31 [ffffa5c3c1f9fcc0] ftrace_startup at ffffffff971a6638
  #32 [ffffa5c3c1f9fce8] register_ftrace_function at ffffffff971a66a0

When we switch it to raw_spin_lock_irqsave the return probe
on _raw_spin_lock starts working.

Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
Cc: stable@vger.kernel.org
Reported-by: David Valin <dvalin@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/kprobes.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index c83e54727131..c82056b354cc 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1154,9 +1154,11 @@ void recycle_rp_inst(struct kretprobe_instance *ri,
 	hlist_del(&ri->hlist);
 	INIT_HLIST_NODE(&ri->hlist);
 	if (likely(rp)) {
-		raw_spin_lock(&rp->lock);
+		unsigned long flags;
+
+		raw_spin_lock_irqsave(&rp->lock, flags);
 		hlist_add_head(&ri->hlist, &rp->free_instances);
-		raw_spin_unlock(&rp->lock);
+		raw_spin_unlock_irqrestore(&rp->lock, flags);
 	} else
 		/* Unregistering */
 		hlist_add_head(&ri->hlist, head);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] kprobes: Fix locking in recycle_rp_inst
  2019-02-27 13:33   ` Jiri Olsa
@ 2019-02-28  6:50     ` Masami Hiramatsu
  2019-03-01 23:00       ` David Valin
  0 siblings, 1 reply; 5+ messages in thread
From: Masami Hiramatsu @ 2019-02-28  6:50 UTC (permalink / raw)
  To: Jiri Olsa, Ingo Molnar
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	lkml, Srinivasa D S, Hien Nguyen, David Valin, stable

On Wed, 27 Feb 2019 14:33:09 +0100
Jiri Olsa <jolsa@redhat.com> wrote:

> On Wed, Feb 27, 2019 at 05:38:46PM +0900, Masami Hiramatsu wrote:
> 
> SNIP
> 
> > > When we switch it to raw_spin_lock_irqsave the return probe
> > > on _raw_spin_lock starts working.
> > 
> > Yes, there can be a race between probes and probe on irq handler.
> > 
> > kretprobe_hash_lock()/kretprobe_hash_unlock() are safe because
> > those disables irqs. Only recycle_rp_inst() has this problem.
> > 
> > Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> > 
> > And this is one of the oldest bug in kprobe.
> > 
> > commit ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking")
> > 
> > introduced the spin_lock(&rp->lock) in recycle_rp_inst() but forgot to disable irqs.
> > And 
> > 
> > commit c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
> 
> ok, so I'll add:
> 
> Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
> 
> > 
> > introduced assembly-based trampoline which didn't disable irq.
> > 
> > Could you add Cc:stable to this patch too?
> 
> sure, attaching patch with updated changelog

Thank you Jiri,

Ingo, could you pick this patch? (and my kretprobe fixup series?)

Thank you,

> 
> thanks,
> jirka
> 
> 
> ---
> We can call recycle_rp_inst from both task and irq contexts,
> so we should irqsave/irqrestore locking functions.
> 
> I wasn't able to hit this particular lockup, but I found it
> while checking on why return probe on _raw_spin_lock locks
> the system, reported by David by using bpftrace on simple
> script, like:
> 
>   kprobe:_raw_spin_lock
>   {
>       @time[tid] = nsecs;
>       @symb[tid] = arg0;
>   }
> 
>   kretprobe:_raw_spin_lock
>       / @time[tid] /
>   {
>       delete(@time[tid]);
>       delete(@symb[tid]);
>   }
> 
> or by perf tool:
> 
>   # perf probe -a _raw_spin_lock:%return
>   # perf record -e probe:_raw_spin_lock__return -a
> 
> The thing is that the _raw_spin_lock call in recycle_rp_inst,
> is the only one that return probe code paths call and it will
> trigger another kprobe instance while already processing one
> and lock up on kretprobe_table_lock lock:
> 
>   #12 [ffff99c337403d28] queued_spin_lock_slowpath at ffffffff9712693b
>   #13 [ffff99c337403d28] _raw_spin_lock_irqsave at ffffffff9794c100
>   #14 [ffff99c337403d38] pre_handler_kretprobe at ffffffff9719435c
>   #15 [ffff99c337403d68] kprobe_ftrace_handler at ffffffff97059f12
>   #16 [ffff99c337403d98] ftrace_ops_assist_func at ffffffff971a0421
>   #17 [ffff99c337403dd8] handle_edge_irq at ffffffff97139f55
>   #18 [ffff99c337403df0] handle_edge_irq at ffffffff97139f55
>   #19 [ffff99c337403e58] _raw_spin_lock at ffffffff9794c111
>   #20 [ffff99c337403e88] _raw_spin_lock at ffffffff9794c115
>   #21 [ffff99c337403ea8] trampoline_handler at ffffffff97058a8f
>   #22 [ffff99c337403f00] kretprobe_trampoline at ffffffff970586d5
>   #23 [ffff99c337403fb0] handle_irq at ffffffff97027b1f
>   #24 [ffff99c337403fc0] do_IRQ at ffffffff97a01bc9
>   --- <IRQ stack> ---
>   #25 [ffffa5c3c1f9fb38] ret_from_intr at ffffffff97a0098f
>       [exception RIP: smp_call_function_many+460]
>       RIP: ffffffff9716685c  RSP: ffffa5c3c1f9fbe0  RFLAGS: 00000202
>       RAX: 0000000000000005  RBX: ffff99c337421c80  RCX: ffff99c337566260
>       RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff99c337421c88
>       RBP: ffff99c337421c88   R8: 0000000000000001   R9: ffffffff98352940
>       R10: ffff99c33703c910  R11: ffffffff9794c110  R12: ffffffff97055680
>       R13: 0000000000000000  R14: 0000000000000001  R15: 0000000000000040
>       ORIG_RAX: ffffffffffffffde  CS: 0010  SS: 0018
>   #26 [ffffa5c3c1f9fc20] on_each_cpu at ffffffff97166918
>   #27 [ffffa5c3c1f9fc40] ftrace_replace_code at ffffffff97055a34
>   #28 [ffffa5c3c1f9fc88] ftrace_modify_all_code at ffffffff971a3552
>   #29 [ffffa5c3c1f9fca8] arch_ftrace_update_code at ffffffff97055a6c
>   #30 [ffffa5c3c1f9fcb0] ftrace_run_update_code at ffffffff971a3683
>   #31 [ffffa5c3c1f9fcc0] ftrace_startup at ffffffff971a6638
>   #32 [ffffa5c3c1f9fce8] register_ftrace_function at ffffffff971a66a0
> 
> When we switch it to raw_spin_lock_irqsave the return probe
> on _raw_spin_lock starts working.
> 
> Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
> Cc: stable@vger.kernel.org
> Reported-by: David Valin <dvalin@redhat.com>
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  kernel/kprobes.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index c83e54727131..c82056b354cc 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1154,9 +1154,11 @@ void recycle_rp_inst(struct kretprobe_instance *ri,
>  	hlist_del(&ri->hlist);
>  	INIT_HLIST_NODE(&ri->hlist);
>  	if (likely(rp)) {
> -		raw_spin_lock(&rp->lock);
> +		unsigned long flags;
> +
> +		raw_spin_lock_irqsave(&rp->lock, flags);
>  		hlist_add_head(&ri->hlist, &rp->free_instances);
> -		raw_spin_unlock(&rp->lock);
> +		raw_spin_unlock_irqrestore(&rp->lock, flags);
>  	} else
>  		/* Unregistering */
>  		hlist_add_head(&ri->hlist, head);
> -- 
> 2.17.2
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] kprobes: Fix locking in recycle_rp_inst
  2019-02-28  6:50     ` Masami Hiramatsu
@ 2019-03-01 23:00       ` David Valin
  0 siblings, 0 replies; 5+ messages in thread
From: David Valin @ 2019-03-01 23:00 UTC (permalink / raw)
  To: Masami Hiramatsu, Jiri Olsa, Ingo Molnar
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	lkml, Srinivasa D S, Hien Nguyen, stable

Finally got around to testing it.  Verified the issue was still 
present.  Then install Jiri's fix.  Cleared the issue out.  So at least 
from the standpoint of fixing the hang, the change is good.

Dave

On 2/28/19 1:50 AM, Masami Hiramatsu wrote:
> On Wed, 27 Feb 2019 14:33:09 +0100
> Jiri Olsa <jolsa@redhat.com> wrote:
>
>> On Wed, Feb 27, 2019 at 05:38:46PM +0900, Masami Hiramatsu wrote:
>>
>> SNIP
>>
>>>> When we switch it to raw_spin_lock_irqsave the return probe
>>>> on _raw_spin_lock starts working.
>>> Yes, there can be a race between probes and probe on irq handler.
>>>
>>> kretprobe_hash_lock()/kretprobe_hash_unlock() are safe because
>>> those disables irqs. Only recycle_rp_inst() has this problem.
>>>
>>> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
>>>
>>> And this is one of the oldest bug in kprobe.
>>>
>>> commit ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking")
>>>
>>> introduced the spin_lock(&rp->lock) in recycle_rp_inst() but forgot to disable irqs.
>>> And
>>>
>>> commit c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
>> ok, so I'll add:
>>
>> Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
>>
>>> introduced assembly-based trampoline which didn't disable irq.
>>>
>>> Could you add Cc:stable to this patch too?
>> sure, attaching patch with updated changelog
> Thank you Jiri,
>
> Ingo, could you pick this patch? (and my kretprobe fixup series?)
>
> Thank you,
>
>> thanks,
>> jirka
>>
>>
>> ---
>> We can call recycle_rp_inst from both task and irq contexts,
>> so we should irqsave/irqrestore locking functions.
>>
>> I wasn't able to hit this particular lockup, but I found it
>> while checking on why return probe on _raw_spin_lock locks
>> the system, reported by David by using bpftrace on simple
>> script, like:
>>
>>    kprobe:_raw_spin_lock
>>    {
>>        @time[tid] = nsecs;
>>        @symb[tid] = arg0;
>>    }
>>
>>    kretprobe:_raw_spin_lock
>>        / @time[tid] /
>>    {
>>        delete(@time[tid]);
>>        delete(@symb[tid]);
>>    }
>>
>> or by perf tool:
>>
>>    # perf probe -a _raw_spin_lock:%return
>>    # perf record -e probe:_raw_spin_lock__return -a
>>
>> The thing is that the _raw_spin_lock call in recycle_rp_inst,
>> is the only one that return probe code paths call and it will
>> trigger another kprobe instance while already processing one
>> and lock up on kretprobe_table_lock lock:
>>
>>    #12 [ffff99c337403d28] queued_spin_lock_slowpath at ffffffff9712693b
>>    #13 [ffff99c337403d28] _raw_spin_lock_irqsave at ffffffff9794c100
>>    #14 [ffff99c337403d38] pre_handler_kretprobe at ffffffff9719435c
>>    #15 [ffff99c337403d68] kprobe_ftrace_handler at ffffffff97059f12
>>    #16 [ffff99c337403d98] ftrace_ops_assist_func at ffffffff971a0421
>>    #17 [ffff99c337403dd8] handle_edge_irq at ffffffff97139f55
>>    #18 [ffff99c337403df0] handle_edge_irq at ffffffff97139f55
>>    #19 [ffff99c337403e58] _raw_spin_lock at ffffffff9794c111
>>    #20 [ffff99c337403e88] _raw_spin_lock at ffffffff9794c115
>>    #21 [ffff99c337403ea8] trampoline_handler at ffffffff97058a8f
>>    #22 [ffff99c337403f00] kretprobe_trampoline at ffffffff970586d5
>>    #23 [ffff99c337403fb0] handle_irq at ffffffff97027b1f
>>    #24 [ffff99c337403fc0] do_IRQ at ffffffff97a01bc9
>>    --- <IRQ stack> ---
>>    #25 [ffffa5c3c1f9fb38] ret_from_intr at ffffffff97a0098f
>>        [exception RIP: smp_call_function_many+460]
>>        RIP: ffffffff9716685c  RSP: ffffa5c3c1f9fbe0  RFLAGS: 00000202
>>        RAX: 0000000000000005  RBX: ffff99c337421c80  RCX: ffff99c337566260
>>        RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff99c337421c88
>>        RBP: ffff99c337421c88   R8: 0000000000000001   R9: ffffffff98352940
>>        R10: ffff99c33703c910  R11: ffffffff9794c110  R12: ffffffff97055680
>>        R13: 0000000000000000  R14: 0000000000000001  R15: 0000000000000040
>>        ORIG_RAX: ffffffffffffffde  CS: 0010  SS: 0018
>>    #26 [ffffa5c3c1f9fc20] on_each_cpu at ffffffff97166918
>>    #27 [ffffa5c3c1f9fc40] ftrace_replace_code at ffffffff97055a34
>>    #28 [ffffa5c3c1f9fc88] ftrace_modify_all_code at ffffffff971a3552
>>    #29 [ffffa5c3c1f9fca8] arch_ftrace_update_code at ffffffff97055a6c
>>    #30 [ffffa5c3c1f9fcb0] ftrace_run_update_code at ffffffff971a3683
>>    #31 [ffffa5c3c1f9fcc0] ftrace_startup at ffffffff971a6638
>>    #32 [ffffa5c3c1f9fce8] register_ftrace_function at ffffffff971a66a0
>>
>> When we switch it to raw_spin_lock_irqsave the return probe
>> on _raw_spin_lock starts working.
>>
>> Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
>> Cc: stable@vger.kernel.org
>> Reported-by: David Valin <dvalin@redhat.com>
>> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>> ---
>>   kernel/kprobes.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
>> index c83e54727131..c82056b354cc 100644
>> --- a/kernel/kprobes.c
>> +++ b/kernel/kprobes.c
>> @@ -1154,9 +1154,11 @@ void recycle_rp_inst(struct kretprobe_instance *ri,
>>   	hlist_del(&ri->hlist);
>>   	INIT_HLIST_NODE(&ri->hlist);
>>   	if (likely(rp)) {
>> -		raw_spin_lock(&rp->lock);
>> +		unsigned long flags;
>> +
>> +		raw_spin_lock_irqsave(&rp->lock, flags);
>>   		hlist_add_head(&ri->hlist, &rp->free_instances);
>> -		raw_spin_unlock(&rp->lock);
>> +		raw_spin_unlock_irqrestore(&rp->lock, flags);
>>   	} else
>>   		/* Unregistering */
>>   		hlist_add_head(&ri->hlist, head);
>> -- 
>> 2.17.2
>>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-03-01 23:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26 16:10 [RFC] kprobes: Fix locking in recycle_rp_inst Jiri Olsa
2019-02-27  8:38 ` Masami Hiramatsu
2019-02-27 13:33   ` Jiri Olsa
2019-02-28  6:50     ` Masami Hiramatsu
2019-03-01 23:00       ` David Valin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).