All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tracing/ftrace: Fix the potential hang on MIPS SMP
@ 2010-08-24  6:06 Wu Zhangjin
  2010-08-27 16:49 ` wu zhangjin
  0 siblings, 1 reply; 2+ messages in thread
From: Wu Zhangjin @ 2010-08-24  6:06 UTC (permalink / raw)
  To: Ralf Baechle, Steven Rostedt, David Daney; +Cc: linux-mips

From: Wu Zhangjin <zhangjin.wu@windriver.com>

In Ftrace, we need to flush the icache after code modification to ensure
the instructions will be executed are exactly what we want.

And for the following reason(arch/x86/kernel/ftrace.c):

 * Modifying code must take extra care. On an SMP machine, if
 * the code being modified is also being executed on another CPU
 * that CPU will have undefined results and possibly take a GPF.
 * We use kstop_machine to stop other CPUS from exectuing code.

In SMP, the code modification are protected by stop_machine(), which
will disables the irqs of all cpus and then modify the code, flush the
icache.

In MIPS SMP, to tell the other cpus to flush their related icache, the
IPI interrupt must be sent to them and wait for them exiting from the
icache flushing, but for the stop_machine() have disabled interrupts, it
will need to wait for the other cpus all the time, then deadlock ->
hang.

(Note: cavium is an exception, benefit from its synci instruction, it
doesn't call smp_call_function() to execute the icache flushing
operation but send the ICACHE_FLUSH ipi to the other cpus directly, so,
no wait and no hang on cavium, and after the irqs of the cpus are
enabled, the pending icache flush interrupt will be filed and synci will
flush the icache on every cpu respectively, so, no cache problem).

To break this deadlock, the key is: stop calling flush_icache_range() in
stop_machine() but delay it after stop_machine(). delaying the icache
flushing operation doesn't influence the tracing results even if the
other cpus execute the code just modified before the icache flushing,
for the kernel tracing will only be enabled after users issuing:

$ echo 1 > /path/to/tracing/tracing_enabled

Thanks to the weak functions: ftrace_arch_code_modify_prepare() and
ftrace_arch_code_modify_post_process(). they are called by
ftrace_run_update_code() before and after stop_machine() respectively,
with them, ftrace_modify_code() can check whether it is called in
stop_machine() and if called in stop_machine(), then delay the operation
of icache flushing, as a result, the deadlock is broken.

Without this patch, Ftrace for RMI XLS will hang after issuing the
following command:

$ echo function > /path/to/tracing/current_tracer

Exactly, it hangs on kernel/smp.c:

void smp_call_function_many(const struct cpumask *mask,
{
	[snip]
	/*
	 * Can deadlock when called with interrupts disabled.
	 * We allow cpu's that are not yet online though, as no one else can
	 * send smp call function interrupt to this cpu and as such deadlocks
	 * can't happen.
	 */
	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
		     && !oops_in_progress);
	[snip]
	csd_lock(&data->csd);

	[snip]
	/* Send a message to all CPUs in the map */
	arch_send_call_function_ipi_mask(data->cpumask);

	/* Optionally wait for the CPUs to complete */
	if (wait)
		csd_lock_wait(&data->csd);	--> hang here
}

Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
---
 arch/mips/kernel/ftrace.c |   22 +++++++++++++++++++++-
 1 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
index 5a84a1f..c8ebb13 100644
--- a/arch/mips/kernel/ftrace.c
+++ b/arch/mips/kernel/ftrace.c
@@ -69,6 +69,23 @@ static inline void ftrace_dyn_arch_init_insns(void)
 #endif
 }
 
+#ifdef CONFIG_SMP
+static int machine_stopped __read_mostly;
+
+int ftrace_arch_code_modify_prepare(void)
+{
+	machine_stopped = 1;
+	return 0;
+}
+
+int ftrace_arch_code_modify_post_process(void)
+{
+	__flush_cache_all();
+	machine_stopped = 0;
+	return 0;
+}
+#endif
+
 static int ftrace_modify_code(unsigned long ip, unsigned int new_code)
 {
 	int faulted;
@@ -79,7 +96,10 @@ static int ftrace_modify_code(unsigned long ip, unsigned int new_code)
 	if (unlikely(faulted))
 		return -EFAULT;
 
-	flush_icache_range(ip, ip + 8);
+#ifdef CONFIG_SMP
+	if (!machine_stopped)
+#endif
+		flush_icache_range(ip, ip + 8);
 
 	return 0;
 }
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] tracing/ftrace: Fix the potential hang on MIPS SMP
  2010-08-24  6:06 [PATCH] tracing/ftrace: Fix the potential hang on MIPS SMP Wu Zhangjin
@ 2010-08-27 16:49 ` wu zhangjin
  0 siblings, 0 replies; 2+ messages in thread
From: wu zhangjin @ 2010-08-27 16:49 UTC (permalink / raw)
  To: Ralf Baechle, Steven Rostedt, David Daney; +Cc: linux-mips

Hi,

This is not applicable, a revision with the new flag
machine_stop_pending instead of machine_stopped will be sent out
tomorrow.

Regards,
Wu Zhangjin

On 8/24/10, Wu Zhangjin <wuzhangjin@gmail.com> wrote:
> From: Wu Zhangjin <zhangjin.wu@windriver.com>
>
> In Ftrace, we need to flush the icache after code modification to ensure
> the instructions will be executed are exactly what we want.
>
> And for the following reason(arch/x86/kernel/ftrace.c):
>
>  * Modifying code must take extra care. On an SMP machine, if
>  * the code being modified is also being executed on another CPU
>  * that CPU will have undefined results and possibly take a GPF.
>  * We use kstop_machine to stop other CPUS from exectuing code.
>
> In SMP, the code modification are protected by stop_machine(), which
> will disables the irqs of all cpus and then modify the code, flush the
> icache.
>
> In MIPS SMP, to tell the other cpus to flush their related icache, the
> IPI interrupt must be sent to them and wait for them exiting from the
> icache flushing, but for the stop_machine() have disabled interrupts, it
> will need to wait for the other cpus all the time, then deadlock ->
> hang.
>
> (Note: cavium is an exception, benefit from its synci instruction, it
> doesn't call smp_call_function() to execute the icache flushing
> operation but send the ICACHE_FLUSH ipi to the other cpus directly, so,
> no wait and no hang on cavium, and after the irqs of the cpus are
> enabled, the pending icache flush interrupt will be filed and synci will
> flush the icache on every cpu respectively, so, no cache problem).
>
> To break this deadlock, the key is: stop calling flush_icache_range() in
> stop_machine() but delay it after stop_machine(). delaying the icache
> flushing operation doesn't influence the tracing results even if the
> other cpus execute the code just modified before the icache flushing,
> for the kernel tracing will only be enabled after users issuing:
>
> $ echo 1 > /path/to/tracing/tracing_enabled
>
> Thanks to the weak functions: ftrace_arch_code_modify_prepare() and
> ftrace_arch_code_modify_post_process(). they are called by
> ftrace_run_update_code() before and after stop_machine() respectively,
> with them, ftrace_modify_code() can check whether it is called in
> stop_machine() and if called in stop_machine(), then delay the operation
> of icache flushing, as a result, the deadlock is broken.
>
> Without this patch, Ftrace for RMI XLS will hang after issuing the
> following command:
>
> $ echo function > /path/to/tracing/current_tracer
>
> Exactly, it hangs on kernel/smp.c:
>
> void smp_call_function_many(const struct cpumask *mask,
> {
> 	[snip]
> 	/*
> 	 * Can deadlock when called with interrupts disabled.
> 	 * We allow cpu's that are not yet online though, as no one else can
> 	 * send smp call function interrupt to this cpu and as such deadlocks
> 	 * can't happen.
> 	 */
> 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> 		     && !oops_in_progress);
> 	[snip]
> 	csd_lock(&data->csd);
>
> 	[snip]
> 	/* Send a message to all CPUs in the map */
> 	arch_send_call_function_ipi_mask(data->cpumask);
>
> 	/* Optionally wait for the CPUs to complete */
> 	if (wait)
> 		csd_lock_wait(&data->csd);	--> hang here
> }
>
> Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
> ---
>  arch/mips/kernel/ftrace.c |   22 +++++++++++++++++++++-
>  1 files changed, 21 insertions(+), 1 deletions(-)
>
> diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
> index 5a84a1f..c8ebb13 100644
> --- a/arch/mips/kernel/ftrace.c
> +++ b/arch/mips/kernel/ftrace.c
> @@ -69,6 +69,23 @@ static inline void ftrace_dyn_arch_init_insns(void)
>  #endif
>  }
>
> +#ifdef CONFIG_SMP
> +static int machine_stopped __read_mostly;
> +
> +int ftrace_arch_code_modify_prepare(void)
> +{
> +	machine_stopped = 1;
> +	return 0;
> +}
> +
> +int ftrace_arch_code_modify_post_process(void)
> +{
> +	__flush_cache_all();
> +	machine_stopped = 0;
> +	return 0;
> +}
> +#endif
> +
>  static int ftrace_modify_code(unsigned long ip, unsigned int new_code)
>  {
>  	int faulted;
> @@ -79,7 +96,10 @@ static int ftrace_modify_code(unsigned long ip, unsigned
> int new_code)
>  	if (unlikely(faulted))
>  		return -EFAULT;
>
> -	flush_icache_range(ip, ip + 8);
> +#ifdef CONFIG_SMP
> +	if (!machine_stopped)
> +#endif
> +		flush_icache_range(ip, ip + 8);
>
>  	return 0;
>  }
> --
> 1.7.0.4
>
>
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-08-27 16:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-24  6:06 [PATCH] tracing/ftrace: Fix the potential hang on MIPS SMP Wu Zhangjin
2010-08-27 16:49 ` wu zhangjin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.