linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH v2] trace: prevent preemption in perf_ftrace_function_call()
@ 2021-09-28  3:12 王贇
  2021-10-09  0:03 ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: 王贇 @ 2021-09-28  3:12 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, open list

With CONFIG_DEBUG_PREEMPT we observed reports like:

  BUG: using smp_processor_id() in preemptible
  caller is perf_ftrace_function_call+0x6f/0x2e0
  CPU: 1 PID: 680 Comm: a.out Not tainted
  Call Trace:
   <TASK>
   dump_stack_lvl+0x8d/0xcf
   check_preemption_disabled+0x104/0x110
   ? optimize_nops.isra.7+0x230/0x230
   ? text_poke_bp_batch+0x9f/0x310
   perf_ftrace_function_call+0x6f/0x2e0
   ...
   __text_poke+0x5/0x620
   text_poke_bp_batch+0x9f/0x310

This telling us the CPU could be changed after task is preempted, and
the checking on CPU before preemption will be invalid.

This patch just turn off preemption in perf_ftrace_function_call()
to prevent CPU changing.

Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
---
 kernel/trace/trace_event_perf.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 6aed10e..dcbefdf 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -441,12 +441,19 @@ void perf_trace_buf_update(void *record, u16 type)
 	if (!rcu_is_watching())
 		return;

+	/*
+	 * Prevent CPU changing from now on. rcu must
+	 * be in watching if the task was migrated and
+	 * scheduled.
+	 */
+	preempt_disable_notrace();
+
 	if ((unsigned long)ops->private != smp_processor_id())
-		return;
+		goto out;

 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
-		return;
+		goto out;

 	event = container_of(ops, struct perf_event, ftrace_ops);

@@ -468,16 +475,18 @@ void perf_trace_buf_update(void *record, u16 type)

 	entry = perf_trace_buf_alloc(ENTRY_SIZE, NULL, &rctx);
 	if (!entry)
-		goto out;
+		goto unlock;

 	entry->ip = ip;
 	entry->parent_ip = parent_ip;
 	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, TRACE_FN,
 			      1, &regs, &head, NULL);

-out:
+unlock:
 	ftrace_test_recursion_unlock(bit);
 #undef ENTRY_SIZE
+out:
+	preempt_enable_notrace();
 }

 static int perf_ftrace_function_register(struct perf_event *event)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH v2] trace: prevent preemption in perf_ftrace_function_call()
  2021-09-28  3:12 [RESEND PATCH v2] trace: prevent preemption in perf_ftrace_function_call() 王贇
@ 2021-10-09  0:03 ` Steven Rostedt
  2021-10-11  2:38   ` 王贇
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2021-10-09  0:03 UTC (permalink / raw)
  To: 王贇; +Cc: Ingo Molnar, open list, Peter Zijlstra, Jiri Olsa

On Tue, 28 Sep 2021 11:12:24 +0800
王贇 <yun.wang@linux.alibaba.com> wrote:

> With CONFIG_DEBUG_PREEMPT we observed reports like:
> 
>   BUG: using smp_processor_id() in preemptible
>   caller is perf_ftrace_function_call+0x6f/0x2e0
>   CPU: 1 PID: 680 Comm: a.out Not tainted
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x8d/0xcf
>    check_preemption_disabled+0x104/0x110
>    ? optimize_nops.isra.7+0x230/0x230
>    ? text_poke_bp_batch+0x9f/0x310
>    perf_ftrace_function_call+0x6f/0x2e0
>    ...

It would be useful to see the full backtrace, to know how this
happened. Please do not shorten back traces.

-- Steve

>    __text_poke+0x5/0x620
>    text_poke_bp_batch+0x9f/0x310
> 
> This telling us the CPU could be changed after task is preempted, and
> the checking on CPU before preemption will be invalid.
> 
> This patch just turn off preemption in perf_ftrace_function_call()
> to prevent CPU changing.
> 
> Reported-by: Abaci <abaci@linux.alibaba.com>
> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
> ---
>  kernel/trace/trace_event_perf.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index 6aed10e..dcbefdf 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -441,12 +441,19 @@ void perf_trace_buf_update(void *record, u16 type)
>  	if (!rcu_is_watching())
>  		return;
> 
> +	/*
> +	 * Prevent CPU changing from now on. rcu must
> +	 * be in watching if the task was migrated and
> +	 * scheduled.
> +	 */
> +	preempt_disable_notrace();
> +
>  	if ((unsigned long)ops->private != smp_processor_id())
> -		return;
> +		goto out;
> 
>  	bit = ftrace_test_recursion_trylock(ip, parent_ip);
>  	if (bit < 0)
> -		return;
> +		goto out;
> 
>  	event = container_of(ops, struct perf_event, ftrace_ops);
> 
> @@ -468,16 +475,18 @@ void perf_trace_buf_update(void *record, u16 type)
> 
>  	entry = perf_trace_buf_alloc(ENTRY_SIZE, NULL, &rctx);
>  	if (!entry)
> -		goto out;
> +		goto unlock;
> 
>  	entry->ip = ip;
>  	entry->parent_ip = parent_ip;
>  	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, TRACE_FN,
>  			      1, &regs, &head, NULL);
> 
> -out:
> +unlock:
>  	ftrace_test_recursion_unlock(bit);
>  #undef ENTRY_SIZE
> +out:
> +	preempt_enable_notrace();
>  }
> 
>  static int perf_ftrace_function_register(struct perf_event *event)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH v2] trace: prevent preemption in perf_ftrace_function_call()
  2021-10-09  0:03 ` Steven Rostedt
@ 2021-10-11  2:38   ` 王贇
  2021-10-11  8:32     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: 王贇 @ 2021-10-11  2:38 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, open list, Peter Zijlstra, Jiri Olsa



On 2021/10/9 上午8:03, Steven Rostedt wrote:
> On Tue, 28 Sep 2021 11:12:24 +0800
> 王贇 <yun.wang@linux.alibaba.com> wrote:
> 
>> With CONFIG_DEBUG_PREEMPT we observed reports like:
>>
>>   BUG: using smp_processor_id() in preemptible
>>   caller is perf_ftrace_function_call+0x6f/0x2e0
>>   CPU: 1 PID: 680 Comm: a.out Not tainted
>>   Call Trace:
>>    <TASK>
>>    dump_stack_lvl+0x8d/0xcf
>>    check_preemption_disabled+0x104/0x110
>>    ? optimize_nops.isra.7+0x230/0x230
>>    ? text_poke_bp_batch+0x9f/0x310
>>    perf_ftrace_function_call+0x6f/0x2e0
>>    ...
> 
> It would be useful to see the full backtrace, to know how this
> happened. Please do not shorten back traces.

Here it is, I suppose some where on the path of __text_poke failed to
exempt from ftrace, whose context is allowed to be preempted?

[   18.172274][  T678] BUG: using smp_processor_id() in preemptible [00000000] code: a.out/678
[   18.173734][  T678] caller is perf_ftrace_function_call+0x6f/0x2d0
[   18.174781][  T678] CPU: 0 PID: 678 Comm: a.out Not tainted 5.15.0-rc3-next-20210928+ #637
[   18.176101][  T678] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   18.177137][  T678] Call Trace:
[   18.177696][  T678]  <TASK>
[   18.178197][  T678]  dump_stack_lvl+0x8e/0xd1
[   18.178941][  T678]  check_preemption_disabled+0x103/0x110
[   18.179847][  T678]  ? text_poke_bp_batch+0x95/0x240
[   18.180673][  T678]  ? text_poke_loc_init+0x110/0x110
[   18.181506][  T678]  perf_ftrace_function_call+0x6f/0x2d0
[   18.182394][  T678]  ? 0xffffffffa0106083
[   18.183092][  T678]  ? _find_next_bit+0x10a/0x130
[   18.183881][  T678]  ? find_held_lock+0x2b/0x80
[   18.184648][  T678]  ? exc_int3+0xb/0xd0
[   18.185325][  T678]  0xffffffffa0106083
[   18.185996][  T678]  ? __text_poke+0x580/0x580
[   18.186744][  T678]  ? __text_poke+0x5/0x580
[   18.187471][  T678]  __text_poke+0x5/0x580
[   18.188214][  T678]  text_poke_bp_batch+0x95/0x240
[   18.189071][  T678]  text_poke_flush+0x5d/0x80
[   18.189872][  T678]  ? do_clear_cpu_cap+0x150/0x150
[   18.190732][  T678]  text_poke_queue+0x28/0x90
[   18.191531][  T678]  ftrace_replace_code+0x10f/0x160
[   18.192416][  T678]  ftrace_modify_all_code+0x10d/0x1e0
[   18.193334][  T678]  ? ftrace_ops_test+0xa0/0xa0
[   18.194159][  T678]  ftrace_run_update_code+0x1d/0x80
[   18.195038][  T678]  ftrace_startup.part.0+0x139/0x1b0
[   18.195946][  T678]  register_ftrace_function+0x69/0xb0
[   18.196871][  T678]  perf_ftrace_event_register+0x7c/0xf0
[   18.197811][  T678]  perf_trace_event_init+0x60/0x80
[   18.198689][  T678]  perf_trace_init+0xac/0x120
[   18.199500][  T678]  ? perf_init_event+0x170/0x3a0
[   18.200342][  T678]  perf_tp_event_init+0x2f/0x70
[   18.201185][  T678]  perf_try_init_event+0x56/0x1b0
[   18.202047][  T678]  ? perf_init_event+0x170/0x3a0
[   18.202896][  T678]  perf_init_event+0x13c/0x3a0
[   18.203727][  T678]  perf_event_alloc.part.0+0x47c/0xe50
[   18.204657][  T678]  __do_sys_perf_event_open+0x27b/0x1350
[   18.205624][  T678]  ? do_user_addr_fault+0x2e7/0x8d0
[   18.206539][  T678]  do_syscall_64+0x35/0x80
[   18.207311][  T678]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.208294][  T678] RIP: 0033:0x7ff74a67d4dd
[   18.209068][  T678] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 89 0c 00 f7 d8 64 89 01 48
[   18.212166][  T678] RSP: 002b:00007ffc36a6bf18 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[   18.213566][  T678] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff74a67d4dd
[   18.214903][  T678] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000080
[   18.216232][  T678] RBP: 00007ffc36a6bf20 R08: 0000000000000000 R09: 0000000000401660
[   18.217584][  T678] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401040
[   18.218914][  T678] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   18.220283][  T678]  </TASK>
[   18.220910][  T678] BUG: using smp_processor_id() in preemptible [00000000] code: a.out/678
[   18.222362][  T678] caller is perf_ftrace_function_call+0x6f/0x2d0
[   18.223441][  T678] CPU: 0 PID: 678 Comm: a.out Not tainted 5.15.0-rc3-next-20210928+ #637
[   18.224855][  T678] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   18.225946][  T678] Call Trace:
[   18.226552][  T678]  <TASK>
[   18.227102][  T678]  dump_stack_lvl+0x8e/0xd1
[   18.227885][  T678]  check_preemption_disabled+0x103/0x110
[   18.228837][  T678]  ? text_poke_bp_batch+0x95/0x240
[   18.229714][  T678]  ? text_poke_loc_init+0x110/0x110
[   18.230599][  T678]  perf_ftrace_function_call+0x6f/0x2d0
[   18.231541][  T678]  ? 0xffffffffa0106083
[   18.232283][  T678]  ? _find_next_bit+0x10a/0x130
[   18.233118][  T678]  ? find_held_lock+0x2b/0x80
[   18.233926][  T678]  ? exc_int3+0xb/0xd0
[   18.234649][  T678]  0xffffffffa0106083
[   18.235367][  T678]  ? do_sync_core+0x30/0x30
[   18.236153][  T678]  ? __text_poke+0x5/0x580
[   18.236920][  T678]  __text_poke+0x5/0x580
[   18.237666][  T678]  text_poke_bp_batch+0x95/0x240
[   18.238527][  T678]  text_poke_flush+0x5d/0x80
[   18.239321][  T678]  ? do_clear_cpu_cap+0x150/0x150
[   18.240174][  T678]  text_poke_queue+0x28/0x90
[   18.240980][  T678]  ftrace_replace_code+0x10f/0x160
[   18.241859][  T678]  ftrace_modify_all_code+0x10d/0x1e0
[   18.242772][  T678]  ? ftrace_ops_test+0xa0/0xa0
[   18.243593][  T678]  ftrace_run_update_code+0x1d/0x80
[   18.244474][  T678]  ftrace_startup.part.0+0x139/0x1b0
[   18.245376][  T678]  register_ftrace_function+0x69/0xb0
[   18.246290][  T678]  perf_ftrace_event_register+0x7c/0xf0
[   18.247237][  T678]  perf_trace_event_init+0x60/0x80
[   18.248114][  T678]  perf_trace_init+0xac/0x120
[   18.248919][  T678]  ? perf_init_event+0x170/0x3a0
[   18.249767][  T678]  perf_tp_event_init+0x2f/0x70
[   18.250622][  T678]  perf_try_init_event+0x56/0x1b0
[   18.251492][  T678]  ? perf_init_event+0x170/0x3a0
[   18.252336][  T678]  perf_init_event+0x13c/0x3a0
[   18.253167][  T678]  perf_event_alloc.part.0+0x47c/0xe50
[   18.254111][  T678]  __do_sys_perf_event_open+0x27b/0x1350
[   18.255080][  T678]  ? do_user_addr_fault+0x2e7/0x8d0
[   18.256002][  T678]  do_syscall_64+0x35/0x80
[   18.256767][  T678]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.257747][  T678] RIP: 0033:0x7ff74a67d4dd
[   18.258521][  T678] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 89 0c 00 f7 d8 64 89 01 48
[   18.261620][  T678] RSP: 002b:00007ffc36a6bf18 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[   18.263025][  T678] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff74a67d4dd
[   18.264363][  T678] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000080
[   18.265702][  T678] RBP: 00007ffc36a6bf20 R08: 0000000000000000 R09: 0000000000401660
[   18.267050][  T678] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401040
[   18.268383][  T678] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   18.269758][  T678]  </TASK>
[   18.270358][  T678] BUG: using smp_processor_id() in preemptible [00000000] code: a.out/678
[   18.271809][  T678] caller is perf_ftrace_function_call+0x6f/0x2d0
[   18.272889][  T678] CPU: 0 PID: 678 Comm: a.out Not tainted 5.15.0-rc3-next-20210928+ #637
[   18.274387][  T678] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   18.275473][  T678] Call Trace:
[   18.276074][  T678]  <TASK>


And this is the way to reproduce:

// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <endian.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>

#define BITMASK(bf_off,bf_len) (((1ull << (bf_len)) - 1) << (bf_off))
#define STORE_BY_BITMASK(type,htobe,addr,val,bf_off,bf_len) *(type*)(addr) = htobe((htobe(*(type*)(addr)) & ~BITMASK((bf_off), (bf_len))) | (((type)(val) << (bf_off)) & BITMASK((bf_off), (bf_len))))

int main(void)
{
		syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
	syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
	syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);

*(uint32_t*)0x20000080 = 2;
*(uint32_t*)0x20000084 = 0x80;
*(uint8_t*)0x20000088 = 1;
*(uint8_t*)0x20000089 = 0;
*(uint8_t*)0x2000008a = 0;
*(uint8_t*)0x2000008b = 0;
*(uint32_t*)0x2000008c = 0;
*(uint64_t*)0x20000090 = 0;
*(uint64_t*)0x20000098 = 0;
*(uint64_t*)0x200000a0 = 0;
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 0, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 1, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 2, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 3, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 4, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 5, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 6, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 7, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 8, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 9, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 10, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 11, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 12, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 13, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 14, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 15, 2);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 17, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 18, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 19, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 20, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 21, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 22, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 23, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 24, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 25, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 26, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 27, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 28, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 29, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 30, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 31, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 32, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 33, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 34, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 35, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 36, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 37, 1);
STORE_BY_BITMASK(uint64_t, , 0x200000a8, 0, 38, 26);
*(uint32_t*)0x200000b0 = 4;
*(uint32_t*)0x200000b4 = 0;
*(uint64_t*)0x200000b8 = 0;
*(uint64_t*)0x200000c0 = 0;
*(uint64_t*)0x200000c8 = 0;
*(uint64_t*)0x200000d0 = 0;
*(uint32_t*)0x200000d8 = 0;
*(uint32_t*)0x200000dc = 0;
*(uint64_t*)0x200000e0 = 0;
*(uint32_t*)0x200000e8 = 0;
*(uint16_t*)0x200000ec = 0;
*(uint16_t*)0x200000ee = 0;
*(uint32_t*)0x200000f0 = 0;
*(uint32_t*)0x200000f4 = 0;
*(uint64_t*)0x200000f8 = 0;
	syscall(__NR_perf_event_open, 0x20000080ul, 0, 0ul, -1, 0ul);
	return 0;
}

Regards,
Michael Wang

> 
> -- Steve
> 
>>    __text_poke+0x5/0x620
>>    text_poke_bp_batch+0x9f/0x310
>>
>> This telling us the CPU could be changed after task is preempted, and
>> the checking on CPU before preemption will be invalid.
>>
>> This patch just turn off preemption in perf_ftrace_function_call()
>> to prevent CPU changing.
>>
>> Reported-by: Abaci <abaci@linux.alibaba.com>
>> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
>> ---
>>  kernel/trace/trace_event_perf.c | 17 +++++++++++++----
>>  1 file changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
>> index 6aed10e..dcbefdf 100644
>> --- a/kernel/trace/trace_event_perf.c
>> +++ b/kernel/trace/trace_event_perf.c
>> @@ -441,12 +441,19 @@ void perf_trace_buf_update(void *record, u16 type)
>>  	if (!rcu_is_watching())
>>  		return;
>>
>> +	/*
>> +	 * Prevent CPU changing from now on. rcu must
>> +	 * be in watching if the task was migrated and
>> +	 * scheduled.
>> +	 */
>> +	preempt_disable_notrace();
>> +
>>  	if ((unsigned long)ops->private != smp_processor_id())
>> -		return;
>> +		goto out;
>>
>>  	bit = ftrace_test_recursion_trylock(ip, parent_ip);
>>  	if (bit < 0)
>> -		return;
>> +		goto out;
>>
>>  	event = container_of(ops, struct perf_event, ftrace_ops);
>>
>> @@ -468,16 +475,18 @@ void perf_trace_buf_update(void *record, u16 type)
>>
>>  	entry = perf_trace_buf_alloc(ENTRY_SIZE, NULL, &rctx);
>>  	if (!entry)
>> -		goto out;
>> +		goto unlock;
>>
>>  	entry->ip = ip;
>>  	entry->parent_ip = parent_ip;
>>  	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, TRACE_FN,
>>  			      1, &regs, &head, NULL);
>>
>> -out:
>> +unlock:
>>  	ftrace_test_recursion_unlock(bit);
>>  #undef ENTRY_SIZE
>> +out:
>> +	preempt_enable_notrace();
>>  }
>>
>>  static int perf_ftrace_function_register(struct perf_event *event)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH v2] trace: prevent preemption in perf_ftrace_function_call()
  2021-10-11  2:38   ` 王贇
@ 2021-10-11  8:32     ` Peter Zijlstra
  2021-10-11  8:48       ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2021-10-11  8:32 UTC (permalink / raw)
  To: 王贇; +Cc: Steven Rostedt, Ingo Molnar, open list, Jiri Olsa

On Mon, Oct 11, 2021 at 10:38:39AM +0800, 王贇 wrote:
> 
> 
> On 2021/10/9 上午8:03, Steven Rostedt wrote:
> > On Tue, 28 Sep 2021 11:12:24 +0800
> > 王贇 <yun.wang@linux.alibaba.com> wrote:
> > 
> >> With CONFIG_DEBUG_PREEMPT we observed reports like:
> >>
> >>   BUG: using smp_processor_id() in preemptible
> >>   caller is perf_ftrace_function_call+0x6f/0x2e0
> >>   CPU: 1 PID: 680 Comm: a.out Not tainted
> >>   Call Trace:
> >>    <TASK>
> >>    dump_stack_lvl+0x8d/0xcf
> >>    check_preemption_disabled+0x104/0x110
> >>    ? optimize_nops.isra.7+0x230/0x230
> >>    ? text_poke_bp_batch+0x9f/0x310
> >>    perf_ftrace_function_call+0x6f/0x2e0
> >>    ...
> > 
> > It would be useful to see the full backtrace, to know how this
> > happened. Please do not shorten back traces.
> 
> Here it is, I suppose some where on the path of __text_poke failed to
> exempt from ftrace, whose context is allowed to be preempted?

Urgh, 'funny' trace that... it ends up tracing text_poke while it's
installing the ftrace handlers.

But no, that looks quite all right.

Your patch is also right, I think, except we can improve upon it by
having that recursion_trylock imply preemption disable (as is proper for
a 'lock').

Also, make sure to disable preemption before, and enable it after,
taking, or releasing, the lock. That way we don't have preemption holes.

---
 arch/csky/kernel/probes/ftrace.c     |  2 --
 arch/parisc/kernel/ftrace.c          |  2 --
 arch/powerpc/kernel/kprobes-ftrace.c |  2 --
 arch/s390/kernel/ftrace.c            |  2 --
 arch/x86/kernel/kprobes/ftrace.c     |  2 --
 include/linux/trace_recursion.h      | 10 +++++++++-
 kernel/livepatch/patch.c             |  6 ------
 kernel/trace/trace_functions.c       |  5 -----
 8 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index ef2bb9bd9605..dff7921531ef 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -24,7 +24,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		return;
 
 	regs = ftrace_get_regs(fregs);
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (!p) {
 		p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE));
@@ -64,7 +63,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 0a1e75af5382..3543496174d3 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -216,7 +216,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		return;
 
 	regs = ftrace_get_regs(fregs);
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -245,7 +244,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	}
 	__this_cpu_write(current_kprobe, NULL);
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 7154d58338cc..072ebe7f290b 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -26,7 +26,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
 		return;
 
 	regs = ftrace_get_regs(fregs);
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)nip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -61,7 +60,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index 1d94ffdf347b..fa1e3510ea98 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -367,7 +367,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		return;
 
 	regs = ftrace_get_regs(fregs);
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -395,7 +394,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	}
 	__this_cpu_write(current_kprobe, NULL);
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 596de2f6d3a5..dd2ec14adb77 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -25,7 +25,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	if (bit < 0)
 		return;
 
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -59,7 +58,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index a9f9c5714e65..ca12e2d8e060 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -214,7 +214,14 @@ static __always_inline void trace_clear_recursion(int bit)
 static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
 							 unsigned long parent_ip)
 {
-	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+	bool ret;
+
+	preempt_disable_notrace();
+	ret = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+	if (!ret)
+		preempt_enable_notrace();
+
+	return ret;
 }
 
 /**
@@ -226,6 +233,7 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
 static __always_inline void ftrace_test_recursion_unlock(int bit)
 {
 	trace_clear_recursion(bit);
+	preempt_enable_notrace();
 }
 
 #endif /* CONFIG_TRACING */
diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index e8029aea67f1..6e66ccdefaa9 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -52,11 +52,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (WARN_ON_ONCE(bit < 0))
 		return;
-	/*
-	 * A variant of synchronize_rcu() is used to allow patching functions
-	 * where RCU is not watching, see klp_synchronize_transition().
-	 */
-	preempt_disable_notrace();
 
 	func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
 				      stack_node);
@@ -120,7 +115,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
 	klp_arch_set_pc(fregs, (unsigned long)func->new_func);
 
 unlock:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 1f0e63f5d1f9..9f1bfbe105e8 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -186,7 +186,6 @@ function_trace_call(unsigned long ip, unsigned long parent_ip,
 		return;
 
 	trace_ctx = tracing_gen_ctx();
-	preempt_disable_notrace();
 
 	cpu = smp_processor_id();
 	data = per_cpu_ptr(tr->array_buffer.data, cpu);
@@ -194,7 +193,6 @@ function_trace_call(unsigned long ip, unsigned long parent_ip,
 		trace_function(tr, ip, parent_ip, trace_ctx);
 
 	ftrace_test_recursion_unlock(bit);
-	preempt_enable_notrace();
 }
 
 #ifdef CONFIG_UNWINDER_ORC
@@ -298,8 +296,6 @@ function_no_repeats_trace_call(unsigned long ip, unsigned long parent_ip,
 	if (bit < 0)
 		return;
 
-	preempt_disable_notrace();
-
 	cpu = smp_processor_id();
 	data = per_cpu_ptr(tr->array_buffer.data, cpu);
 	if (atomic_read(&data->disabled))
@@ -324,7 +320,6 @@ function_no_repeats_trace_call(unsigned long ip, unsigned long parent_ip,
 
 out:
 	ftrace_test_recursion_unlock(bit);
-	preempt_enable_notrace();
 }
 
 static void

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH v2] trace: prevent preemption in perf_ftrace_function_call()
  2021-10-11  8:32     ` Peter Zijlstra
@ 2021-10-11  8:48       ` Peter Zijlstra
  2021-10-11  9:39         ` 王贇
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2021-10-11  8:48 UTC (permalink / raw)
  To: 王贇; +Cc: Steven Rostedt, Ingo Molnar, open list, Jiri Olsa

On Mon, Oct 11, 2021 at 10:32:46AM +0200, Peter Zijlstra wrote:
> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> index a9f9c5714e65..ca12e2d8e060 100644
> --- a/include/linux/trace_recursion.h
> +++ b/include/linux/trace_recursion.h
> @@ -214,7 +214,14 @@ static __always_inline void trace_clear_recursion(int bit)
>  static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>  							 unsigned long parent_ip)
>  {
> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> +	bool ret;
> +
> +	preempt_disable_notrace();
> +	ret = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> +	if (!ret)
> +		preempt_enable_notrace();
> +
> +	return ret;
>  }
>  
>  /**

Oh, I might've gotten that wrong, I assumed regular trylock semantics,
but it doesn't look like that's right.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH v2] trace: prevent preemption in perf_ftrace_function_call()
  2021-10-11  8:48       ` Peter Zijlstra
@ 2021-10-11  9:39         ` 王贇
  2021-10-11 14:45           ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: 王贇 @ 2021-10-11  9:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Steven Rostedt, Ingo Molnar, open list, Jiri Olsa



On 2021/10/11 下午4:48, Peter Zijlstra wrote:
> On Mon, Oct 11, 2021 at 10:32:46AM +0200, Peter Zijlstra wrote:
>> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
>> index a9f9c5714e65..ca12e2d8e060 100644
>> --- a/include/linux/trace_recursion.h
>> +++ b/include/linux/trace_recursion.h
>> @@ -214,7 +214,14 @@ static __always_inline void trace_clear_recursion(int bit)
>>  static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>>  							 unsigned long parent_ip)
>>  {
>> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>> +	bool ret;
>> +
>> +	preempt_disable_notrace();
>> +	ret = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>> +	if (!ret)
>> +		preempt_enable_notrace();
>> +
>> +	return ret;
>>  }
>>  
>>  /**
> 
> Oh, I might've gotten that wrong, I assumed regular trylock semantics,
> but it doesn't look like that's right.

I will use bit instead ret and give some testing :-)

BTW, would you prefer to merge these changes into this patch or maybe send
another patch with your suggested-by?

Regards,
Michael Wang

> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH v2] trace: prevent preemption in perf_ftrace_function_call()
  2021-10-11  9:39         ` 王贇
@ 2021-10-11 14:45           ` Peter Zijlstra
  2021-10-12  5:51             ` 王贇
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2021-10-11 14:45 UTC (permalink / raw)
  To: 王贇; +Cc: Steven Rostedt, Ingo Molnar, open list, Jiri Olsa

On Mon, Oct 11, 2021 at 05:39:32PM +0800, 王贇 wrote:
> 
> 
> On 2021/10/11 下午4:48, Peter Zijlstra wrote:
> > On Mon, Oct 11, 2021 at 10:32:46AM +0200, Peter Zijlstra wrote:
> >> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> >> index a9f9c5714e65..ca12e2d8e060 100644
> >> --- a/include/linux/trace_recursion.h
> >> +++ b/include/linux/trace_recursion.h
> >> @@ -214,7 +214,14 @@ static __always_inline void trace_clear_recursion(int bit)
> >>  static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
> >>  							 unsigned long parent_ip)
> >>  {
> >> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> >> +	bool ret;
> >> +
> >> +	preempt_disable_notrace();
> >> +	ret = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> >> +	if (!ret)
> >> +		preempt_enable_notrace();
> >> +
> >> +	return ret;
> >>  }
> >>  
> >>  /**
> > 
> > Oh, I might've gotten that wrong, I assumed regular trylock semantics,
> > but it doesn't look like that's right.
> 
> I will use bit instead ret and give some testing :-)
> 
> BTW, would you prefer to merge these changes into this patch or maybe send
> another patch with your suggested-by?

Yeah, please send another patch; once you've confirmed it actually works
etc.. I did this before waking (as evidence per the above), who knows
what else I did wrong :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH v2] trace: prevent preemption in perf_ftrace_function_call()
  2021-10-11 14:45           ` Peter Zijlstra
@ 2021-10-12  5:51             ` 王贇
  0 siblings, 0 replies; 8+ messages in thread
From: 王贇 @ 2021-10-12  5:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Steven Rostedt, Ingo Molnar, open list, Jiri Olsa



On 2021/10/11 下午10:45, Peter Zijlstra wrote:
[snip]
>>>
>>> Oh, I might've gotten that wrong, I assumed regular trylock semantics,
>>> but it doesn't look like that's right.
>>
>> I will use bit instead ret and give some testing :-)
>>
>> BTW, would you prefer to merge these changes into this patch or maybe send
>> another patch with your suggested-by?
> 
> Yeah, please send another patch; once you've confirmed it actually works
> etc.. I did this before waking (as evidence per the above), who knows
> what else I did wrong :-)

I've send the:
  [PATCH 0/2] ftrace: make sure preemption disabled on recursion testing

should have taking care all the places, but only testing with x86 since I
got no machine for other arch... just by logically it should be fine.

Regards,
Michael Wang

> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-12  5:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28  3:12 [RESEND PATCH v2] trace: prevent preemption in perf_ftrace_function_call() 王贇
2021-10-09  0:03 ` Steven Rostedt
2021-10-11  2:38   ` 王贇
2021-10-11  8:32     ` Peter Zijlstra
2021-10-11  8:48       ` Peter Zijlstra
2021-10-11  9:39         ` 王贇
2021-10-11 14:45           ` Peter Zijlstra
2021-10-12  5:51             ` 王贇

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).