linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] fix & prevent the missing preemption disabling
@ 2021-10-13  8:51 王贇
  2021-10-13  8:51 ` [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock() 王贇
  2021-10-13  8:52 ` [PATCH v3 2/2] ftrace: do CPU checking after preemption disabled 王贇
  0 siblings, 2 replies; 13+ messages in thread
From: 王贇 @ 2021-10-13  8:51 UTC (permalink / raw)
  To: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Colin Ian King, Masami Hiramatsu,
	Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching

The testing show that perf_ftrace_function_call() are using smp_processor_id()
with preemption enabled, all the checking on CPU could be wrong after preemption.

As Peter point out, the section between ftrace_test_recursion_trylock/unlock()
pair require the preemption to be disabled as 'Documentation/trace/ftrace-uses.rst'
explained, but currently the work is done outside of the helpers.

Patch 1/2 will make sure preemption disabled after trylock() succeed,
patch 2/2 will do smp_processor_id() checking after trylock to address the
issue.

v1: https://lore.kernel.org/all/8c7de46d-9869-aa5e-2bb9-5dbc2eda395e@linux.alibaba.com/
v2: https://lore.kernel.org/all/b1d7fe43-ce84-0ed7-32f7-ea1d12d0b716@linux.alibaba.com/

Michael Wang (2):
  ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  ftrace: do CPU checking after preemption disabled

 arch/csky/kernel/probes/ftrace.c     |  2 --
 arch/parisc/kernel/ftrace.c          |  2 --
 arch/powerpc/kernel/kprobes-ftrace.c |  2 --
 arch/riscv/kernel/probes/ftrace.c    |  2 --
 arch/x86/kernel/kprobes/ftrace.c     |  2 --
 include/linux/trace_recursion.h      | 17 ++++++++++++++++-
 kernel/livepatch/patch.c             | 13 +++++++------
 kernel/trace/trace_event_perf.c      |  6 +++---
 kernel/trace/trace_functions.c       |  5 -----
 9 files changed, 26 insertions(+), 25 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-13  8:51 [PATCH v3 0/2] fix & prevent the missing preemption disabling 王贇
@ 2021-10-13  8:51 ` 王贇
  2021-10-14  9:13   ` Miroslav Benes
  2021-10-14 15:14   ` Petr Mladek
  2021-10-13  8:52 ` [PATCH v3 2/2] ftrace: do CPU checking after preemption disabled 王贇
  1 sibling, 2 replies; 13+ messages in thread
From: 王贇 @ 2021-10-13  8:51 UTC (permalink / raw)
  To: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Colin Ian King, Masami Hiramatsu,
	Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching

As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.

This path will make sure the preemption was disabled when trylock()
succeed, and the unlock() will enable the preemption if previously
enabled.

CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
---
 arch/csky/kernel/probes/ftrace.c     |  2 --
 arch/parisc/kernel/ftrace.c          |  2 --
 arch/powerpc/kernel/kprobes-ftrace.c |  2 --
 arch/riscv/kernel/probes/ftrace.c    |  2 --
 arch/x86/kernel/kprobes/ftrace.c     |  2 --
 include/linux/trace_recursion.h      | 17 ++++++++++++++++-
 kernel/livepatch/patch.c             | 13 +++++++------
 kernel/trace/trace_functions.c       |  5 -----
 8 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index b388228..834cffc 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -17,7 +17,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		return;

 	regs = ftrace_get_regs(fregs);
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (!p) {
 		p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE));
@@ -57,7 +56,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 0a1e75a..3543496 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -216,7 +216,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		return;

 	regs = ftrace_get_regs(fregs);
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -245,7 +244,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	}
 	__this_cpu_write(current_kprobe, NULL);
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 7154d58..072ebe7 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -26,7 +26,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
 		return;

 	regs = ftrace_get_regs(fregs);
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)nip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -61,7 +60,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c
index aab85a8..7142ec4 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -15,7 +15,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	if (bit < 0)
 		return;

-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -52,7 +51,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 596de2f..dd2ec14 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -25,7 +25,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	if (bit < 0)
 		return;

-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -59,7 +58,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index a9f9c57..58e474c 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -214,7 +214,18 @@ static __always_inline void trace_clear_recursion(int bit)
 static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
 							 unsigned long parent_ip)
 {
-	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+	int bit;
+
+	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+	/*
+	 * The zero bit indicate we are nested
+	 * in another trylock(), which means the
+	 * preemption already disabled.
+	 */
+	if (bit > 0)
+		preempt_disable_notrace();
+
+	return bit;
 }

 /**
@@ -222,9 +233,13 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
  * @bit: The return of a successful ftrace_test_recursion_trylock()
  *
  * This is used at the end of a ftrace callback.
+ *
+ * Preemption will be enabled (if it was previously enabled).
  */
 static __always_inline void ftrace_test_recursion_unlock(int bit)
 {
+	if (bit)
+		preempt_enable_notrace();
 	trace_clear_recursion(bit);
 }

diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index e8029ae..b8d75fb 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -49,14 +49,16 @@ static void notrace klp_ftrace_handler(unsigned long ip,

 	ops = container_of(fops, struct klp_ops, fops);

+	/*
+	 *
+	 * The ftrace_test_recursion_trylock() will disable preemption,
+	 * which is required for the variant of synchronize_rcu() that is
+	 * used to allow patching functions where RCU is not watching.
+	 * See klp_synchronize_transition() for more details.
+	 */
 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (WARN_ON_ONCE(bit < 0))
 		return;
-	/*
-	 * A variant of synchronize_rcu() is used to allow patching functions
-	 * where RCU is not watching, see klp_synchronize_transition().
-	 */
-	preempt_disable_notrace();

 	func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
 				      stack_node);
@@ -120,7 +122,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
 	klp_arch_set_pc(fregs, (unsigned long)func->new_func);

 unlock:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }

diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 1f0e63f..9f1bfbe 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -186,7 +186,6 @@ static void function_trace_start(struct trace_array *tr)
 		return;

 	trace_ctx = tracing_gen_ctx();
-	preempt_disable_notrace();

 	cpu = smp_processor_id();
 	data = per_cpu_ptr(tr->array_buffer.data, cpu);
@@ -194,7 +193,6 @@ static void function_trace_start(struct trace_array *tr)
 		trace_function(tr, ip, parent_ip, trace_ctx);

 	ftrace_test_recursion_unlock(bit);
-	preempt_enable_notrace();
 }

 #ifdef CONFIG_UNWINDER_ORC
@@ -298,8 +296,6 @@ static inline void process_repeats(struct trace_array *tr,
 	if (bit < 0)
 		return;

-	preempt_disable_notrace();
-
 	cpu = smp_processor_id();
 	data = per_cpu_ptr(tr->array_buffer.data, cpu);
 	if (atomic_read(&data->disabled))
@@ -324,7 +320,6 @@ static inline void process_repeats(struct trace_array *tr,

 out:
 	ftrace_test_recursion_unlock(bit);
-	preempt_enable_notrace();
 }

 static void
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/2] ftrace: do CPU checking after preemption disabled
  2021-10-13  8:51 [PATCH v3 0/2] fix & prevent the missing preemption disabling 王贇
  2021-10-13  8:51 ` [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock() 王贇
@ 2021-10-13  8:52 ` 王贇
  1 sibling, 0 replies; 13+ messages in thread
From: 王贇 @ 2021-10-13  8:52 UTC (permalink / raw)
  To: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Colin Ian King, Masami Hiramatsu,
	Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching

With CONFIG_DEBUG_PREEMPT we observed reports like:

  BUG: using smp_processor_id() in preemptible
  caller is perf_ftrace_function_call+0x6f/0x2e0
  CPU: 1 PID: 680 Comm: a.out Not tainted
  Call Trace:
   <TASK>
   dump_stack_lvl+0x8d/0xcf
   check_preemption_disabled+0x104/0x110
   ? optimize_nops.isra.7+0x230/0x230
   ? text_poke_bp_batch+0x9f/0x310
   perf_ftrace_function_call+0x6f/0x2e0
   ...
   __text_poke+0x5/0x620
   text_poke_bp_batch+0x9f/0x310

This telling us the CPU could be changed after task is preempted, and
the checking on CPU before preemption will be invalid.

Since now ftrace_test_recursion_trylock() will help to disable the
preemption, this patch just do the checking after trylock() to address
the issue.

CC: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
---
 kernel/trace/trace_event_perf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 6aed10e..fba8cb7 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -441,13 +441,13 @@ void perf_trace_buf_update(void *record, u16 type)
 	if (!rcu_is_watching())
 		return;

-	if ((unsigned long)ops->private != smp_processor_id())
-		return;
-
 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;

+	if ((unsigned long)ops->private != smp_processor_id())
+		goto out;
+
 	event = container_of(ops, struct perf_event, ftrace_ops);

 	/*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-13  8:51 ` [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock() 王贇
@ 2021-10-14  9:13   ` Miroslav Benes
  2021-10-14  9:22     ` 王贇
  2021-10-14 13:28     ` Steven Rostedt
  2021-10-14 15:14   ` Petr Mladek
  1 sibling, 2 replies; 13+ messages in thread
From: Miroslav Benes @ 2021-10-14  9:13 UTC (permalink / raw)
  To: 王贇
  Cc: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Petr Mladek, Joe Lawrence,
	Colin Ian King, Masami Hiramatsu, Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching

> diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
> index e8029ae..b8d75fb 100644
> --- a/kernel/livepatch/patch.c
> +++ b/kernel/livepatch/patch.c
> @@ -49,14 +49,16 @@ static void notrace klp_ftrace_handler(unsigned long ip,
> 
>  	ops = container_of(fops, struct klp_ops, fops);
> 
> +	/*
> +	 *

This empty line is not useful.

> +	 * The ftrace_test_recursion_trylock() will disable preemption,
> +	 * which is required for the variant of synchronize_rcu() that is
> +	 * used to allow patching functions where RCU is not watching.
> +	 * See klp_synchronize_transition() for more details.
> +	 */
>  	bit = ftrace_test_recursion_trylock(ip, parent_ip);
>  	if (WARN_ON_ONCE(bit < 0))
>  		return;
> -	/*
> -	 * A variant of synchronize_rcu() is used to allow patching functions
> -	 * where RCU is not watching, see klp_synchronize_transition().
> -	 */
> -	preempt_disable_notrace();
> 
>  	func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
>  				      stack_node);
> @@ -120,7 +122,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>  	klp_arch_set_pc(fregs, (unsigned long)func->new_func);
> 
>  unlock:
> -	preempt_enable_notrace();
>  	ftrace_test_recursion_unlock(bit);
>  }

Acked-by: Miroslav Benes <mbenes@suse.cz>

for the livepatch part of the patch.

I would also ask you not to submit new versions so often, so that the 
other reviewers have time to actually review the patch set.

Quoting from Documentation/process/submitting-patches.rst:

"Wait for a minimum of one week before resubmitting or pinging reviewers - 
possibly longer during busy times like merge windows."

Thanks

Miroslav

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-14  9:13   ` Miroslav Benes
@ 2021-10-14  9:22     ` 王贇
  2021-10-14 13:28     ` Steven Rostedt
  1 sibling, 0 replies; 13+ messages in thread
From: 王贇 @ 2021-10-14  9:22 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Petr Mladek, Joe Lawrence,
	Colin Ian King, Masami Hiramatsu, Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching



On 2021/10/14 下午5:13, Miroslav Benes wrote:
>> diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
>> index e8029ae..b8d75fb 100644
>> --- a/kernel/livepatch/patch.c
>> +++ b/kernel/livepatch/patch.c
>> @@ -49,14 +49,16 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>>
>>  	ops = container_of(fops, struct klp_ops, fops);
>>
>> +	/*
>> +	 *
> 
> This empty line is not useful.
> 
>> +	 * The ftrace_test_recursion_trylock() will disable preemption,
>> +	 * which is required for the variant of synchronize_rcu() that is
>> +	 * used to allow patching functions where RCU is not watching.
>> +	 * See klp_synchronize_transition() for more details.
>> +	 */
>>  	bit = ftrace_test_recursion_trylock(ip, parent_ip);
>>  	if (WARN_ON_ONCE(bit < 0))
>>  		return;
>> -	/*
>> -	 * A variant of synchronize_rcu() is used to allow patching functions
>> -	 * where RCU is not watching, see klp_synchronize_transition().
>> -	 */
>> -	preempt_disable_notrace();
>>
>>  	func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
>>  				      stack_node);
>> @@ -120,7 +122,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>>  	klp_arch_set_pc(fregs, (unsigned long)func->new_func);
>>
>>  unlock:
>> -	preempt_enable_notrace();
>>  	ftrace_test_recursion_unlock(bit);
>>  }
> 
> Acked-by: Miroslav Benes <mbenes@suse.cz>
> 
> for the livepatch part of the patch.
> 
> I would also ask you not to submit new versions so often, so that the 
> other reviewers have time to actually review the patch set.
> 
> Quoting from Documentation/process/submitting-patches.rst:
> 
> "Wait for a minimum of one week before resubmitting or pinging reviewers - 
> possibly longer during busy times like merge windows."

Thanks for the Ack and suggestion, will take care from now on :-)

Regards,
Michael Wang

> 
> Thanks
> 
> Miroslav
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-14  9:13   ` Miroslav Benes
  2021-10-14  9:22     ` 王贇
@ 2021-10-14 13:28     ` Steven Rostedt
  1 sibling, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2021-10-14 13:28 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: 王贇,
	Guo Ren, Ingo Molnar, James E.J. Bottomley, Helge Deller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Thomas Gleixner,
	Borislav Petkov, x86, H. Peter Anvin, Josh Poimboeuf,
	Jiri Kosina, Petr Mladek, Joe Lawrence, Colin Ian King,
	Masami Hiramatsu, Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching

On Thu, 14 Oct 2021 11:13:13 +0200 (CEST)
Miroslav Benes <mbenes@suse.cz> wrote:

> for the livepatch part of the patch.
> 
> I would also ask you not to submit new versions so often, so that the 
> other reviewers have time to actually review the patch set.
> 
> Quoting from Documentation/process/submitting-patches.rst:
> 
> "Wait for a minimum of one week before resubmitting or pinging reviewers - 
> possibly longer during busy times like merge windows."

Although, for updates on a small patch set, I would say just a couple of
days, instead of a week. It's annoying when you have a 15 patch set series,
that gets updated on a daily basis. But for something with only 2 patches,
wait just two days. At least, that's how I feel.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-13  8:51 ` [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock() 王贇
  2021-10-14  9:13   ` Miroslav Benes
@ 2021-10-14 15:14   ` Petr Mladek
  2021-10-15  3:13     ` 王贇
  2021-10-15  3:39     ` Steven Rostedt
  1 sibling, 2 replies; 13+ messages in thread
From: Petr Mladek @ 2021-10-14 15:14 UTC (permalink / raw)
  To: 王贇
  Cc: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Colin Ian King, Masami Hiramatsu, Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching

On Wed 2021-10-13 16:51:46, 王贇 wrote:
> As the documentation explained, ftrace_test_recursion_trylock()
> and ftrace_test_recursion_unlock() were supposed to disable and
> enable preemption properly, however currently this work is done
> outside of the function, which could be missing by mistake.
> 
> This path will make sure the preemption was disabled when trylock()
> succeed, and the unlock() will enable the preemption if previously
> enabled.
> 
> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> index a9f9c57..58e474c 100644
> --- a/include/linux/trace_recursion.h
> +++ b/include/linux/trace_recursion.h
> @@ -214,7 +214,18 @@ static __always_inline void trace_clear_recursion(int bit)

We should also update the description above the function, for example:

  /**
   * ftrace_test_recursion_trylock - tests for recursion in same context
   *
   * Use this for ftrace callbacks. This will detect if the function
   * tracing recursed in the same context (normal vs interrupt),
   *
   * Returns: -1 if a recursion happened.
-  *           >= 0 if no recursion
+  *           >= 0 if no recursion (success)
+  *
+  * Disables the preemption on success. It is just for a convenience.
+  * Current users needed to disable the preemtion for some reasons.
   */


>  static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>  							 unsigned long parent_ip)
>  {
> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> +	int bit;
> +
> +	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> +	/*
> +	 * The zero bit indicate we are nested
> +	 * in another trylock(), which means the
> +	 * preemption already disabled.
> +	 */
> +	if (bit > 0)
> +		preempt_disable_notrace();

Is this safe? The preemption is disabled only when
trace_test_and_set_recursion() was called by ftrace_test_recursion_trylock().

We must either always disable the preemtion when bit >= 0.
Or we have to disable the preemtion already in
trace_test_and_set_recursion().


Finally, the comment confused me a lot. The difference between nesting and
recursion is far from clear. And the code is tricky liky like hell :-)
I propose to add some comments, see below for a proposal.

> +
> +	return bit;
>  }
>  /**
> @@ -222,9 +233,13 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>   * @bit: The return of a successful ftrace_test_recursion_trylock()
>   *
>   * This is used at the end of a ftrace callback.
> + *
> + * Preemption will be enabled (if it was previously enabled).
>   */
>  static __always_inline void ftrace_test_recursion_unlock(int bit)
>  {
> +	if (bit)

This is not symetric with trylock(). It should be:

	if (bit > 0)

Anyway, trace_clear_recursion() quiently ignores bit != 0


> +		preempt_enable_notrace();
>  	trace_clear_recursion(bit);
>  }


Below is my proposed patch that tries to better explain the recursion
check:

From 20d69f11e2683262fa0043b803999061cbac543f Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 14 Oct 2021 16:57:39 +0200
Subject: [PATCH] trace: Better describe the recursion check return values

The trace recursion check might be called recursively by different
layers of the tracing code. It is safe recursion and the check
is even optimized for this case.

The problematic recursion is when the traced function is called
by the tracing code. This is properly detected.

Try to explain this difference a better way.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/trace_recursion.h | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index a9f9c5714e65..b5efb804efdf 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -159,13 +159,27 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
 # define do_ftrace_record_recursion(ip, pip)	do { } while (0)
 #endif
 
+/*
+ * trace_test_and_set_recursion() is called on several layers
+ * of the ftrace code when handling the same ftrace entry.
+ * These calls might be nested/recursive.
+ *
+ * It uses TRACE_LIST_*BITs to distinguish between this
+ * internal recursion and recursion caused by calling
+ * the traced function by the ftrace code.
+ *
+ * Returns: > 0 when no recursion
+ *          0 when called recursively internally (safe)
+ *	    -1 when the traced function was called recursively from
+ *             the ftrace handler (unsafe)
+ */
 static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
 							int start, int max)
 {
 	unsigned int val = READ_ONCE(current->trace_recursion);
 	int bit;
 
-	/* A previous recursion check was made */
+	/* Called recursively internally by different ftrace code layers? */
 	if ((val & TRACE_CONTEXT_MASK) > max)
 		return 0;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-14 15:14   ` Petr Mladek
@ 2021-10-15  3:13     ` 王贇
  2021-10-15  4:45       ` 王贇
  2021-10-15  7:28       ` Petr Mladek
  2021-10-15  3:39     ` Steven Rostedt
  1 sibling, 2 replies; 13+ messages in thread
From: 王贇 @ 2021-10-15  3:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Colin Ian King, Masami Hiramatsu, Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching



On 2021/10/14 下午11:14, Petr Mladek wrote:
[snip]
>> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>> +	int bit;
>> +
>> +	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>> +	/*
>> +	 * The zero bit indicate we are nested
>> +	 * in another trylock(), which means the
>> +	 * preemption already disabled.
>> +	 */
>> +	if (bit > 0)
>> +		preempt_disable_notrace();
> 
> Is this safe? The preemption is disabled only when
> trace_test_and_set_recursion() was called by ftrace_test_recursion_trylock().
> 
> We must either always disable the preemtion when bit >= 0.
> Or we have to disable the preemtion already in
> trace_test_and_set_recursion().

Internal calling of trace_test_and_set_recursion() will disable preemption
on succeed, it should be safe.

We can also consider move the logical into trace_test_and_set_recursion()
and trace_clear_recursion(), but I'm not very sure about that... ftrace
internally already make sure preemption disabled, what uncovered is those
users who call API trylock/unlock, isn't it?

> 
> 
> Finally, the comment confused me a lot. The difference between nesting and
> recursion is far from clear. And the code is tricky liky like hell :-)
> I propose to add some comments, see below for a proposal.
The comments do confusing, I'll make it something like:

The zero bit indicate trace recursion happened, whatever
the recursively call was made by ftrace handler or ftrace
itself, the preemption already disabled.

Will this one looks better to you?

> 
>> +
>> +	return bit;
>>  }
>>  /**
>> @@ -222,9 +233,13 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>>   * @bit: The return of a successful ftrace_test_recursion_trylock()
>>   *
>>   * This is used at the end of a ftrace callback.
>> + *
>> + * Preemption will be enabled (if it was previously enabled).
>>   */
>>  static __always_inline void ftrace_test_recursion_unlock(int bit)
>>  {
>> +	if (bit)
> 
> This is not symetric with trylock(). It should be:
> 
> 	if (bit > 0)
> 
> Anyway, trace_clear_recursion() quiently ignores bit != 0

Yes, bit == 0 should not happen in here.

> 
> 
>> +		preempt_enable_notrace();
>>  	trace_clear_recursion(bit);
>>  }
> 
> 
> Below is my proposed patch that tries to better explain the recursion
> check:
> 
> From 20d69f11e2683262fa0043b803999061cbac543f Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@suse.com>
> Date: Thu, 14 Oct 2021 16:57:39 +0200
> Subject: [PATCH] trace: Better describe the recursion check return values
> 
> The trace recursion check might be called recursively by different
> layers of the tracing code. It is safe recursion and the check
> is even optimized for this case.
> 
> The problematic recursion is when the traced function is called
> by the tracing code. This is properly detected.
> 
> Try to explain this difference a better way.
> 
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  include/linux/trace_recursion.h | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> index a9f9c5714e65..b5efb804efdf 100644
> --- a/include/linux/trace_recursion.h
> +++ b/include/linux/trace_recursion.h
> @@ -159,13 +159,27 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
>  # define do_ftrace_record_recursion(ip, pip)	do { } while (0)
>  #endif
>  
> +/*
> + * trace_test_and_set_recursion() is called on several layers
> + * of the ftrace code when handling the same ftrace entry.
> + * These calls might be nested/recursive.
> + *
> + * It uses TRACE_LIST_*BITs to distinguish between this
> + * internal recursion and recursion caused by calling
> + * the traced function by the ftrace code.
> + *
> + * Returns: > 0 when no recursion
> + *          0 when called recursively internally (safe)

The 0 can also happened when ftrace handler recursively called trylock()
under the same context, or not?

Regards,
Michael Wang

> + *	    -1 when the traced function was called recursively from
> + *             the ftrace handler (unsafe)
> + */
>  static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
>  							int start, int max)
>  {
>  	unsigned int val = READ_ONCE(current->trace_recursion);
>  	int bit;
>  
> -	/* A previous recursion check was made */
> +	/* Called recursively internally by different ftrace code layers? */
>  	if ((val & TRACE_CONTEXT_MASK) > max)
>  		return 0;

>  
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-14 15:14   ` Petr Mladek
  2021-10-15  3:13     ` 王贇
@ 2021-10-15  3:39     ` Steven Rostedt
  1 sibling, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2021-10-15  3:39 UTC (permalink / raw)
  To: Petr Mladek
  Cc: 王贇,
	Guo Ren, Ingo Molnar, James E.J. Bottomley, Helge Deller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Thomas Gleixner,
	Borislav Petkov, x86, H. Peter Anvin, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, Colin Ian King,
	Masami Hiramatsu, Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching

On Thu, 14 Oct 2021 17:14:07 +0200
Petr Mladek <pmladek@suse.com> wrote:

>   /**
>    * ftrace_test_recursion_trylock - tests for recursion in same context
>    *
>    * Use this for ftrace callbacks. This will detect if the function
>    * tracing recursed in the same context (normal vs interrupt),
>    *
>    * Returns: -1 if a recursion happened.
> -  *           >= 0 if no recursion
> +  *           >= 0 if no recursion (success)
> +  *
> +  * Disables the preemption on success. It is just for a convenience.
> +  * Current users needed to disable the preemtion for some reasons.
>    */

I started replying to this explaining the difference between bit not
zero and a bit zero, and I think I found a design flaw that can allow
unwanted recursion.

It's late and I'm about to go to bed, but I may have a new patch to fix
this before this gets added, as the fix will conflict with this patch,
and the fix will likely need to go to stable.

Stay tuned.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-15  3:13     ` 王贇
@ 2021-10-15  4:45       ` 王贇
  2021-10-15  7:28       ` Petr Mladek
  1 sibling, 0 replies; 13+ messages in thread
From: 王贇 @ 2021-10-15  4:45 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Colin Ian King, Masami Hiramatsu, Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching



On 2021/10/15 上午11:13, 王贇 wrote:
[snip]
>>  # define do_ftrace_record_recursion(ip, pip)	do { } while (0)
>>  #endif
>>  
>> +/*
>> + * trace_test_and_set_recursion() is called on several layers
>> + * of the ftrace code when handling the same ftrace entry.
>> + * These calls might be nested/recursive.
>> + *
>> + * It uses TRACE_LIST_*BITs to distinguish between this
>> + * internal recursion and recursion caused by calling
>> + * the traced function by the ftrace code.
>> + *
>> + * Returns: > 0 when no recursion
>> + *          0 when called recursively internally (safe)
> 
> The 0 can also happened when ftrace handler recursively called trylock()
> under the same context, or not?
> 

Never mind... you're right about this.

Regards,
Michael Wang

> Regards,
> Michael Wang
> 
>> + *	    -1 when the traced function was called recursively from
>> + *             the ftrace handler (unsafe)
>> + */
>>  static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
>>  							int start, int max)
>>  {
>>  	unsigned int val = READ_ONCE(current->trace_recursion);
>>  	int bit;
>>  
>> -	/* A previous recursion check was made */
>> +	/* Called recursively internally by different ftrace code layers? */
>>  	if ((val & TRACE_CONTEXT_MASK) > max)
>>  		return 0;
> 
>>  
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-15  3:13     ` 王贇
  2021-10-15  4:45       ` 王贇
@ 2021-10-15  7:28       ` Petr Mladek
  2021-10-15  9:12         ` 王贇
  1 sibling, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2021-10-15  7:28 UTC (permalink / raw)
  To: 王贇
  Cc: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Colin Ian King, Masami Hiramatsu, Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching

On Fri 2021-10-15 11:13:08, 王贇 wrote:
> 
> 
> On 2021/10/14 下午11:14, Petr Mladek wrote:
> [snip]
> >> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> >> +	int bit;
> >> +
> >> +	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> >> +	/*
> >> +	 * The zero bit indicate we are nested
> >> +	 * in another trylock(), which means the
> >> +	 * preemption already disabled.
> >> +	 */
> >> +	if (bit > 0)
> >> +		preempt_disable_notrace();
> > 
> > Is this safe? The preemption is disabled only when
> > trace_test_and_set_recursion() was called by ftrace_test_recursion_trylock().
> > 
> > We must either always disable the preemtion when bit >= 0.
> > Or we have to disable the preemtion already in
> > trace_test_and_set_recursion().
> 
> Internal calling of trace_test_and_set_recursion() will disable preemption
> on succeed, it should be safe.

trace_test_and_set_recursion() does _not_ disable preemtion!
It works only because all callers disable the preemption.

It means that the comment is wrong. It is not guarantted that the
preemption will be disabled. It works only by chance.


> We can also consider move the logical into trace_test_and_set_recursion()
> and trace_clear_recursion(), but I'm not very sure about that... ftrace
> internally already make sure preemption disabled

How? Because it calls trace_test_and_set_recursion() via the trylock() API?


> , what uncovered is those users who call API trylock/unlock, isn't
> it?

And this is exactly the problem. trace_test_and_set_recursion() is in
a public header. Anyone could use it. And if anyone uses it in the
future without the trylock() and does not disable the preemtion
explicitely then we are lost again.

And it is even more dangerous. The original code disabled the
preemtion on various layers. As a result, the preemtion was disabled
several times for sure. It means that the deeper layers were
always on the safe side.

With this patch, if the first trace_test_and_set_recursion() caller
does not disable preemtion then trylock() will not disable it either
and the entire code is procceed with preemtion enabled.


> > Finally, the comment confused me a lot. The difference between nesting and
> > recursion is far from clear. And the code is tricky liky like hell :-)
> > I propose to add some comments, see below for a proposal.
> The comments do confusing, I'll make it something like:
> 
> The zero bit indicate trace recursion happened, whatever
> the recursively call was made by ftrace handler or ftrace
> itself, the preemption already disabled.

I am sorry but it is still confusing. We need to find a better way
how to clearly explain the difference between the safe and
unsafe recursion.

My understanding is that the recursion is:

  + "unsafe" when the trace code recursively enters the same trace point.

  + "safe" when ftrace_test_recursion_trylock() is called recursivelly
    while still processing the same trace entry.

> >> +
> >> +	return bit;
> >>  }
> >>  /**
> >> @@ -222,9 +233,13 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
> >>   * @bit: The return of a successful ftrace_test_recursion_trylock()
> >>   *
> >>   * This is used at the end of a ftrace callback.
> >> + *
> >> + * Preemption will be enabled (if it was previously enabled).
> >>   */
> >>  static __always_inline void ftrace_test_recursion_unlock(int bit)
> >>  {
> >> +	if (bit)
> > 
> > This is not symetric with trylock(). It should be:
> > 
> > 	if (bit > 0)
> > 
> > Anyway, trace_clear_recursion() quiently ignores bit != 0
> 
> Yes, bit == 0 should not happen in here.

Yes, it "should" not happen. My point is that we could make the API
more safe. We could do the right thing when
ftrace_test_recursion_unlock() is called with negative @bit.
Ideally, we should also warn about the mis-use.


Anyway, let's wait for Steven. It seems that he found another problem
with the API that should be solved first. The fix will probably
also help to better understand the "safe" vs "unsafe" recursion.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-15  7:28       ` Petr Mladek
@ 2021-10-15  9:12         ` 王贇
  2021-10-15 13:35           ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: 王贇 @ 2021-10-15  9:12 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Colin Ian King, Masami Hiramatsu, Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching



On 2021/10/15 下午3:28, Petr Mladek wrote:
> On Fri 2021-10-15 11:13:08, 王贇 wrote:
>>
>>
>> On 2021/10/14 下午11:14, Petr Mladek wrote:
>> [snip]
>>>> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>>>> +	int bit;
>>>> +
>>>> +	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>>>> +	/*
>>>> +	 * The zero bit indicate we are nested
>>>> +	 * in another trylock(), which means the
>>>> +	 * preemption already disabled.
>>>> +	 */
>>>> +	if (bit > 0)
>>>> +		preempt_disable_notrace();
>>>
>>> Is this safe? The preemption is disabled only when
>>> trace_test_and_set_recursion() was called by ftrace_test_recursion_trylock().
>>>
>>> We must either always disable the preemtion when bit >= 0.
>>> Or we have to disable the preemtion already in
>>> trace_test_and_set_recursion().
>>
>> Internal calling of trace_test_and_set_recursion() will disable preemption
>> on succeed, it should be safe.
> 
> trace_test_and_set_recursion() does _not_ disable preemtion!
> It works only because all callers disable the preemption.

Yup.

> 
> It means that the comment is wrong. It is not guarantted that the
> preemption will be disabled. It works only by chance.
> 
> 
>> We can also consider move the logical into trace_test_and_set_recursion()
>> and trace_clear_recursion(), but I'm not very sure about that... ftrace
>> internally already make sure preemption disabled
> 
> How? Because it calls trace_test_and_set_recursion() via the trylock() API?

I mean currently all the direct caller of trace_test_and_set_recursion()
have disabled the preemption as you mentioned above, but yes if anyone later
write some kernel code to call trace_test_and_set_recursion() without
disabling preemption after, then promise broken.

> 
> 
>> , what uncovered is those users who call API trylock/unlock, isn't
>> it?
> 
> And this is exactly the problem. trace_test_and_set_recursion() is in
> a public header. Anyone could use it. And if anyone uses it in the
> future without the trylock() and does not disable the preemtion
> explicitely then we are lost again.
> 
> And it is even more dangerous. The original code disabled the
> preemtion on various layers. As a result, the preemtion was disabled
> several times for sure. It means that the deeper layers were
> always on the safe side.
> 
> With this patch, if the first trace_test_and_set_recursion() caller
> does not disable preemtion then trylock() will not disable it either
> and the entire code is procceed with preemtion enabled.

Yes, what confusing me at first is that I think people who call
trace_test_and_set_recursion() without trylock() can only be a
ftrace hacker, not a user, but in case if anyone can use it without
respect to preemption stuff, then I think the logical should be inside
trace_test_and_set_recursion() rather than trylock().

> 
> 
>>> Finally, the comment confused me a lot. The difference between nesting and
>>> recursion is far from clear. And the code is tricky liky like hell :-)
>>> I propose to add some comments, see below for a proposal.
>> The comments do confusing, I'll make it something like:
>>
>> The zero bit indicate trace recursion happened, whatever
>> the recursively call was made by ftrace handler or ftrace
>> itself, the preemption already disabled.
> 
> I am sorry but it is still confusing. We need to find a better way
> how to clearly explain the difference between the safe and
> unsafe recursion.
> 
> My understanding is that the recursion is:
> 
>   + "unsafe" when the trace code recursively enters the same trace point.
> 
>   + "safe" when ftrace_test_recursion_trylock() is called recursivelly
>     while still processing the same trace entry.

Maybe take some example would be easier to understand...

Currently there are two way of using ftrace_test_recursion_trylock(),
one with TRACE_FTRACE_XXX we mark as A, one with TRACE_LIST_XXX we mark
as B, then:

A followed by B on same context got bit > 0
B followed by A on any context got bit 0
A followed by A on same context got bit > 0
A followed by A followed by A on same context got bit -1
B followed by B on same context got bit > 0
B followed by B followed by B on same context got bit -1

If we get rid of the TRACE_TRANSITION_BIT which allowed recursion for
onetime, then it would be:

A followed by B on same context got bit > 0
B followed by A on any context got bit 0
A followed by A on same context got bit -1
B followed by B on same context got bit -1

So as long as no continuously AAA it's safe?

> 
>>>> +
>>>> +	return bit;
>>>>  }
>>>>  /**
>>>> @@ -222,9 +233,13 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>>>>   * @bit: The return of a successful ftrace_test_recursion_trylock()
>>>>   *
>>>>   * This is used at the end of a ftrace callback.
>>>> + *
>>>> + * Preemption will be enabled (if it was previously enabled).
>>>>   */
>>>>  static __always_inline void ftrace_test_recursion_unlock(int bit)
>>>>  {
>>>> +	if (bit)
>>>
>>> This is not symetric with trylock(). It should be:
>>>
>>> 	if (bit > 0)
>>>
>>> Anyway, trace_clear_recursion() quiently ignores bit != 0
>>
>> Yes, bit == 0 should not happen in here.
> 
> Yes, it "should" not happen. My point is that we could make the API
> more safe. We could do the right thing when
> ftrace_test_recursion_unlock() is called with negative @bit.
> Ideally, we should also warn about the mis-use.

Agree with a WARN here on bit 0.

> 
> 
> Anyway, let's wait for Steven. It seems that he found another problem
> with the API that should be solved first. The fix will probably
> also help to better understand the "safe" vs "unsafe" recursion.

Cool~

Regards,
Michael Wang

> 
> Best Regards,
> Petr
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()
  2021-10-15  9:12         ` 王贇
@ 2021-10-15 13:35           ` Steven Rostedt
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2021-10-15 13:35 UTC (permalink / raw)
  To: 王贇
  Cc: Petr Mladek, Guo Ren, Ingo Molnar, James E.J. Bottomley,
	Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Colin Ian King, Masami Hiramatsu, Peter Zijlstra (Intel),
	Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
	linux-parisc, linuxppc-dev, linux-riscv, live-patching

On Fri, 15 Oct 2021 17:12:26 +0800
王贇 <yun.wang@linux.alibaba.com> wrote:

> Maybe take some example would be easier to understand...
> 
> Currently there are two way of using ftrace_test_recursion_trylock(),
> one with TRACE_FTRACE_XXX we mark as A, one with TRACE_LIST_XXX we mark
> as B, then:
> 
> A followed by B on same context got bit > 0
> B followed by A on any context got bit 0
> A followed by A on same context got bit > 0
> A followed by A followed by A on same context got bit -1
> B followed by B on same context got bit > 0
> B followed by B followed by B on same context got bit -1
> 
> If we get rid of the TRACE_TRANSITION_BIT which allowed recursion for
> onetime, then it would be:

We can't get rid of the transition bit. It's there to fix a bug. Or at
least until we finish the "noinst" issue which may be true now? I have to
go revisit it.

The reason for the transition bit, is because we were dropping function
traces, that people relied on being there. The problem is that the
recursion protection allows for nested context. That is, it will not detect
recursion if we an interrupt triggers during a trace (while the recursion
lock is held) and then that interrupt does a trace. It is allowed to call
the ftrace_test_recursion_trylock() again.

But, what happens if the trace occurs after the interrupt triggers, but
before the preempt_count states that we are now in interrupt context? As
preempt_count is used by this code to determine what context we are in, if
a trace happens in this "transition" state, without the transition bit, a
recursion is detected (false positive) and the event is dropped. People are
then confused on how an interrupt happened, but the entry of the interrupt
never triggered (according to the trace).

The transition bit allows one level of recursion to handle this case.

The "noinst" work may also remove all locations that can be traced before
the context is updated. I need to see if that is now the case, and if it
is, we can remove the transition bit.

This is not the design issue I mentioned earlier. I'm still working on that
one.

-- Steve


> 
> A followed by B on same context got bit > 0
> B followed by A on any context got bit 0
> A followed by A on same context got bit -1
> B followed by B on same context got bit -1
> 
> So as long as no continuously AAA it's safe?


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-15 13:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-13  8:51 [PATCH v3 0/2] fix & prevent the missing preemption disabling 王贇
2021-10-13  8:51 ` [PATCH v3 1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock() 王贇
2021-10-14  9:13   ` Miroslav Benes
2021-10-14  9:22     ` 王贇
2021-10-14 13:28     ` Steven Rostedt
2021-10-14 15:14   ` Petr Mladek
2021-10-15  3:13     ` 王贇
2021-10-15  4:45       ` 王贇
2021-10-15  7:28       ` Petr Mladek
2021-10-15  9:12         ` 王贇
2021-10-15 13:35           ` Steven Rostedt
2021-10-15  3:39     ` Steven Rostedt
2021-10-13  8:52 ` [PATCH v3 2/2] ftrace: do CPU checking after preemption disabled 王贇

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).