linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ARM64: fixed dump_backtrace() when task running on another cpu
@ 2020-04-09  9:38 Wang Qing
  2020-04-09 10:40 ` Mark Rutland
  2020-04-13  9:23 ` Wang Qing
  0 siblings, 2 replies; 4+ messages in thread
From: Wang Qing @ 2020-04-09  9:38 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Eric W. Biederman, Thomas Gleixner, Wang Qing, jinho lim,
	Dave Martin, linux-arm-kernel, linux-kernel
  Cc: opensource.kernel

We cannot get FP and PC when the task is running on another CPU,
task->thread.cpu_context is the last time the task was switched out,
it's better to give a reminder than to provide wrong information.

Signed-off-by: Wang Qing <wangqing@vivo.com>
---
 arch/arm64/kernel/traps.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index cf402be..c04e3e8 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -106,6 +106,14 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
 		start_backtrace(&frame,
 				(unsigned long)__builtin_frame_address(0),
 				(unsigned long)dump_backtrace);
+	} else if (tsk->on_cpu) {
+		/*
+		 * The task is running in another cpu, so the call stack
+		 * is changing and we cannot get it.
+		 */
+		pr_warn("tsk: %s is running in CPU%d, Don't call trace!\n",
+			tsk->comm, tsk->cpu);
+		return;
 	} else {
 		/*
 		 * task blocked in __switch_to
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] ARM64: fixed dump_backtrace() when task running on another cpu
  2020-04-09  9:38 [PATCH] ARM64: fixed dump_backtrace() when task running on another cpu Wang Qing
@ 2020-04-09 10:40 ` Mark Rutland
  2020-04-13  9:23 ` Wang Qing
  1 sibling, 0 replies; 4+ messages in thread
From: Mark Rutland @ 2020-04-09 10:40 UTC (permalink / raw)
  To: Wang Qing
  Cc: Catalin Marinas, Will Deacon, James Morse, Eric W. Biederman,
	Thomas Gleixner, jinho lim, Dave Martin, linux-arm-kernel,
	linux-kernel, opensource.kernel

Hi,

On Thu, Apr 09, 2020 at 05:38:16PM +0800, Wang Qing wrote:
> We cannot get FP and PC when the task is running on another CPU,
> task->thread.cpu_context is the last time the task was switched out,
> it's better to give a reminder than to provide wrong information.
> 
> Signed-off-by: Wang Qing <wangqing@vivo.com>

Are you seeing this happen anywhere in particular today?

> ---
>  arch/arm64/kernel/traps.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index cf402be..c04e3e8 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -106,6 +106,14 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>  		start_backtrace(&frame,
>  				(unsigned long)__builtin_frame_address(0),
>  				(unsigned long)dump_backtrace);
> +	} else if (tsk->on_cpu) {
> +		/*
> +		 * The task is running in another cpu, so the call stack
> +		 * is changing and we cannot get it.
> +		 */
> +		pr_warn("tsk: %s is running in CPU%d, Don't call trace!\n",
> +			tsk->comm, tsk->cpu);

I believe that we can race with a concurrent write to tsk->cpu in both
cases above. We could use READ_ONCE() to get a snapshot, but we can
still race and miss cases where the task was runnning as we backtrace
it.

Thanks,
Mark.

> +		return;
>  	} else {
>  		/*
>  		 * task blocked in __switch_to
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE:[PATCH] ARM64: fixed dump_backtrace() when task running on another cpu
  2020-04-09  9:38 [PATCH] ARM64: fixed dump_backtrace() when task running on another cpu Wang Qing
  2020-04-09 10:40 ` Mark Rutland
@ 2020-04-13  9:23 ` Wang Qing
  2020-04-14 12:54   ` [PATCH] " Mark Rutland
  1 sibling, 1 reply; 4+ messages in thread
From: Wang Qing @ 2020-04-13  9:23 UTC (permalink / raw)
  To: mark.rutland, linux-arm-kernel, linux-kernel, wangqing; +Cc: opensource.kernel

>Hi,
>
>On Thu, Apr 09, 2020 at 05:38:16PM +0800, Wang Qing wrote:
>> We cannot get FP and PC when the task is running on another CPU,
>> task->thread.cpu_context is the last time the task was switched out,
>> it's better to give a reminder than to provide wrong information.
>> 
>> Signed-off-by: Wang Qing <wangqing@vivo.com>
>
>Are you seeing this happen anywhere in particular today?

This problem is not so obvious, because it will not cause any exceptions
but will show "old" stack always ending with switch_to, I finally confirmed
the problem through debugging.

For example:Task blocked in spinlock/interrupt/busy loop, I want to print
the backtrace when detected(like PSI in Android), the printing is wrong(old).

>
>> ---
>>  arch/arm64/kernel/traps.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>> 
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index cf402be..c04e3e8 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -106,6 +106,14 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>>  		start_backtrace(&frame,
>>  				(unsigned long)__builtin_frame_address(0),
>>  				(unsigned long)dump_backtrace);
>> +	} else if (tsk->on_cpu) {
>> +		/*
>> +		 * The task is running in another cpu, so the call stack
>> +		 * is changing and we cannot get it.
>> +		 */
>> +		pr_warn("tsk: %s is running in CPU%d, Don't call trace!\n",
>> +			tsk->comm, tsk->cpu);
>
>I believe that we can race with a concurrent write to tsk->cpu in both
>cases above. We could use READ_ONCE() to get a snapshot, but we can
>still race and miss cases where the task was runnning as we backtrace
>it.
>
>Thanks,
>Mark.

I will use task_cpu(tsk) instead of tsk->cpu, and add task_running_oncpu() in
include/linux/sched.h instead of tsk->on_cpu, but as you said, by this patch,
we can still race and miss cases where the task was runnning as we backtrace.

But from the user's perspective, printing wrong backtrace is confused when
we call this function while task already running. However, it's reasonable to
print the last backtrace when task enter running during the function is called.

Thanks,
Wang Qing.

>
>> +		return;
>>  	} else {
>>  		/*
>>  		 * task blocked in __switch_to
>> -- 
>> 2.7.4
>> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ARM64: fixed dump_backtrace() when task running on another cpu
  2020-04-13  9:23 ` Wang Qing
@ 2020-04-14 12:54   ` Mark Rutland
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Rutland @ 2020-04-14 12:54 UTC (permalink / raw)
  To: Wang Qing; +Cc: linux-arm-kernel, linux-kernel, opensource.kernel

On Mon, Apr 13, 2020 at 05:23:08PM +0800, Wang Qing wrote:
> >Hi,
> >
> >On Thu, Apr 09, 2020 at 05:38:16PM +0800, Wang Qing wrote:
> >> We cannot get FP and PC when the task is running on another CPU,
> >> task->thread.cpu_context is the last time the task was switched out,
> >> it's better to give a reminder than to provide wrong information.
> >> 
> >> Signed-off-by: Wang Qing <wangqing@vivo.com>
> >
> >Are you seeing this happen anywhere in particular today?
> 
> This problem is not so obvious, because it will not cause any exceptions
> but will show "old" stack always ending with switch_to, I finally confirmed
> the problem through debugging.
> 
> For example:Task blocked in spinlock/interrupt/busy loop, I want to print
> the backtrace when detected(like PSI in Android), the printing is wrong(old).

Sure, but *where* are you seeing this?

Is this a problem in mainline, or only in code that you add?
 
> >
> >> ---
> >>  arch/arm64/kernel/traps.c | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >> 
> >> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> >> index cf402be..c04e3e8 100644
> >> --- a/arch/arm64/kernel/traps.c
> >> +++ b/arch/arm64/kernel/traps.c
> >> @@ -106,6 +106,14 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> >>  		start_backtrace(&frame,
> >>  				(unsigned long)__builtin_frame_address(0),
> >>  				(unsigned long)dump_backtrace);
> >> +	} else if (tsk->on_cpu) {
> >> +		/*
> >> +		 * The task is running in another cpu, so the call stack
> >> +		 * is changing and we cannot get it.
> >> +		 */
> >> +		pr_warn("tsk: %s is running in CPU%d, Don't call trace!\n",
> >> +			tsk->comm, tsk->cpu);
> >
> >I believe that we can race with a concurrent write to tsk->cpu in both
> >cases above. We could use READ_ONCE() to get a snapshot, but we can
> >still race and miss cases where the task was runnning as we backtrace
> >it.
> >
> >Thanks,
> >Mark.
> 
> I will use task_cpu(tsk) instead of tsk->cpu, and add task_running_oncpu() in
> include/linux/sched.h instead of tsk->on_cpu, but as you said, by this patch,
> we can still race and miss cases where the task was runnning as we backtrace.
> 
> But from the user's perspective, printing wrong backtrace is confused when
> we call this function while task already running. However, it's reasonable to
> print the last backtrace when task enter running during the function is called.

The contract of dump_backtrace() is that it's only called for either:

* the current task
* a task that is blocked in switch_to()

... so I don't think that the current behaviour is wrong as such, though
if it's easy to catch misuse I agree it would be nice to do so for
robustness.

However, we can alwatys race here, so detecting this case is best-effort
and not entirely reliable, and I don't want to leave the impression that
it is. I'm also not a fan of pr_warn() here, since this is indicative of
a kernel bug rather than a user/system issue, and can spam the console
due to a lack of ratelimiting.

So, based on whether this is a problem in existing code, I'd like that
code fix first, and then we can consider adding a WARN_ONCE() or
something ratelimited. Ideally something that'll give us the bactrace or
the code that's calling dump_backtrace() erroneously.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-14 12:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-09  9:38 [PATCH] ARM64: fixed dump_backtrace() when task running on another cpu Wang Qing
2020-04-09 10:40 ` Mark Rutland
2020-04-13  9:23 ` Wang Qing
2020-04-14 12:54   ` [PATCH] " Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).