All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-22  3:06 ` Ji Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji Zhang @ 2018-03-22  3:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Matthias Brugger, Mark Rutland,
	Ard Biesheuvel, James Morse, Dave Martin, Marc Zyngier,
	Michael Weiser, Julien Thierry, Xie XiuQi
  Cc: linux-arm-kernel, linux-kernel, linux-mediatek, wsd_upstream,
	Ji Zhang, shadanji

When we dump the backtrace of some specific task, there is a potential race
condition due to the task may be running on other cores if SMP enabled.
That is because for current implementation, if the task is not the current
task, we will get the registers used for unwind from cpu_context saved in
thread_info, which is the snapshot before context switch, but if the task
is running on other cores, the registers and the content of stack are
changed.
This may cause that we get the wrong backtrace or incomplete backtrace or
even crash the kernel.
To avoid this case, do not dump the backtrace of the tasks which are
running on other cores.
This patch cannot solve the issue completely but can shrink the window of
race condition.

Signed-off-by: Ji Zhang <ji.zhang@mediatek.com>
---
 arch/arm64/kernel/traps.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index eb2d151..95749364 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
 	if (tsk == current) {
 		frame.fp = (unsigned long)__builtin_frame_address(0);
 		frame.pc = (unsigned long)dump_backtrace;
+	else if (tsk->state == TASK_RUNNING) {
+		pr_notice("Do not dump other running tasks\n");
+		return;
 	} else {
 		/*
 		 * task blocked in __switch_to
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-22  3:06 ` Ji Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji Zhang @ 2018-03-22  3:06 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Matthias Brugger, Mark Rutland,
	Ard Biesheuvel, James Morse, Dave Martin, Marc Zyngier,
	Michael Weiser, Julien Thierry, Xie XiuQi
  Cc: linux-arm-kernel, linux-kernel, linux-mediatek, wsd_upstream,
	Ji Zhang, shadanji

When we dump the backtrace of some specific task, there is a potential race
condition due to the task may be running on other cores if SMP enabled.
That is because for current implementation, if the task is not the current
task, we will get the registers used for unwind from cpu_context saved in
thread_info, which is the snapshot before context switch, but if the task
is running on other cores, the registers and the content of stack are
changed.
This may cause that we get the wrong backtrace or incomplete backtrace or
even crash the kernel.
To avoid this case, do not dump the backtrace of the tasks which are
running on other cores.
This patch cannot solve the issue completely but can shrink the window of
race condition.

Signed-off-by: Ji Zhang <ji.zhang@mediatek.com>
---
 arch/arm64/kernel/traps.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index eb2d151..95749364 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
 	if (tsk == current) {
 		frame.fp = (unsigned long)__builtin_frame_address(0);
 		frame.pc = (unsigned long)dump_backtrace;
+	else if (tsk->state == TASK_RUNNING) {
+		pr_notice("Do not dump other running tasks\n");
+		return;
 	} else {
 		/*
 		 * task blocked in __switch_to
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-22  3:06 ` Ji Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji Zhang @ 2018-03-22  3:06 UTC (permalink / raw)
  To: linux-arm-kernel

When we dump the backtrace of some specific task, there is a potential race
condition due to the task may be running on other cores if SMP enabled.
That is because for current implementation, if the task is not the current
task, we will get the registers used for unwind from cpu_context saved in
thread_info, which is the snapshot before context switch, but if the task
is running on other cores, the registers and the content of stack are
changed.
This may cause that we get the wrong backtrace or incomplete backtrace or
even crash the kernel.
To avoid this case, do not dump the backtrace of the tasks which are
running on other cores.
This patch cannot solve the issue completely but can shrink the window of
race condition.

Signed-off-by: Ji Zhang <ji.zhang@mediatek.com>
---
 arch/arm64/kernel/traps.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index eb2d151..95749364 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
 	if (tsk == current) {
 		frame.fp = (unsigned long)__builtin_frame_address(0);
 		frame.pc = (unsigned long)dump_backtrace;
+	else if (tsk->state == TASK_RUNNING) {
+		pr_notice("Do not dump other running tasks\n");
+		return;
 	} else {
 		/*
 		 * task blocked in __switch_to
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-03-22  3:06 ` Ji Zhang
@ 2018-03-22  4:57   ` Baruch Siach
  -1 siblings, 0 replies; 36+ messages in thread
From: Baruch Siach @ 2018-03-22  4:57 UTC (permalink / raw)
  To: Ji Zhang
  Cc: Catalin Marinas, Will Deacon, Matthias Brugger, Mark Rutland,
	Ard Biesheuvel, James Morse, Dave Martin, Marc Zyngier,
	Michael Weiser, Julien Thierry, Xie XiuQi, wsd_upstream,
	linux-kernel, linux-mediatek, shadanji, linux-arm-kernel

Hi Ji Zhang,

On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> When we dump the backtrace of some specific task, there is a potential race
> condition due to the task may be running on other cores if SMP enabled.
> That is because for current implementation, if the task is not the current
> task, we will get the registers used for unwind from cpu_context saved in
> thread_info, which is the snapshot before context switch, but if the task
> is running on other cores, the registers and the content of stack are
> changed.
> This may cause that we get the wrong backtrace or incomplete backtrace or
> even crash the kernel.
> To avoid this case, do not dump the backtrace of the tasks which are
> running on other cores.
> This patch cannot solve the issue completely but can shrink the window of
> race condition.
> 
> Signed-off-by: Ji Zhang <ji.zhang@mediatek.com>
> ---
>  arch/arm64/kernel/traps.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index eb2d151..95749364 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>  	if (tsk == current) {
>  		frame.fp = (unsigned long)__builtin_frame_address(0);
>  		frame.pc = (unsigned long)dump_backtrace;
> +	else if (tsk->state == TASK_RUNNING) {

Missing closing brace. Does this build?

> +		pr_notice("Do not dump other running tasks\n");
> +		return;
>  	} else {
>  		/*
>  		 * task blocked in __switch_to

baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-22  4:57   ` Baruch Siach
  0 siblings, 0 replies; 36+ messages in thread
From: Baruch Siach @ 2018-03-22  4:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ji Zhang,

On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> When we dump the backtrace of some specific task, there is a potential race
> condition due to the task may be running on other cores if SMP enabled.
> That is because for current implementation, if the task is not the current
> task, we will get the registers used for unwind from cpu_context saved in
> thread_info, which is the snapshot before context switch, but if the task
> is running on other cores, the registers and the content of stack are
> changed.
> This may cause that we get the wrong backtrace or incomplete backtrace or
> even crash the kernel.
> To avoid this case, do not dump the backtrace of the tasks which are
> running on other cores.
> This patch cannot solve the issue completely but can shrink the window of
> race condition.
> 
> Signed-off-by: Ji Zhang <ji.zhang@mediatek.com>
> ---
>  arch/arm64/kernel/traps.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index eb2d151..95749364 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>  	if (tsk == current) {
>  		frame.fp = (unsigned long)__builtin_frame_address(0);
>  		frame.pc = (unsigned long)dump_backtrace;
> +	else if (tsk->state == TASK_RUNNING) {

Missing closing brace. Does this build?

> +		pr_notice("Do not dump other running tasks\n");
> +		return;
>  	} else {
>  		/*
>  		 * task blocked in __switch_to

baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch at tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-03-22  4:57   ` Baruch Siach
@ 2018-03-22  5:33       ` Ji.Zhang
  -1 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-03-22  5:33 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Mark Rutland, wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, Xie XiuQi,
	Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Julien Thierry,
	Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shadanji-9Onoh4P/yGk, James Morse, Matthias Brugger,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Michael Weiser,
	Dave Martin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Dear Baruch,

Thank you for your kind and careful examination.
That is my stupid fault due to I use different environments to do
verification and submit patch which I do the merge manually.
Anyway, thanks so much for the reminder, I will send another patch.

BRs,
Ji 

On Thu, 2018-03-22 at 06:57 +0200, Baruch Siach wrote:
> Hi Ji Zhang,
> 
> On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> > When we dump the backtrace of some specific task, there is a potential race
> > condition due to the task may be running on other cores if SMP enabled.
> > That is because for current implementation, if the task is not the current
> > task, we will get the registers used for unwind from cpu_context saved in
> > thread_info, which is the snapshot before context switch, but if the task
> > is running on other cores, the registers and the content of stack are
> > changed.
> > This may cause that we get the wrong backtrace or incomplete backtrace or
> > even crash the kernel.
> > To avoid this case, do not dump the backtrace of the tasks which are
> > running on other cores.
> > This patch cannot solve the issue completely but can shrink the window of
> > race condition.
> > 
> > Signed-off-by: Ji Zhang <ji.zhang-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> > ---
> >  arch/arm64/kernel/traps.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > index eb2d151..95749364 100644
> > --- a/arch/arm64/kernel/traps.c
> > +++ b/arch/arm64/kernel/traps.c
> > @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> >  	if (tsk == current) {
> >  		frame.fp = (unsigned long)__builtin_frame_address(0);
> >  		frame.pc = (unsigned long)dump_backtrace;
> > +	else if (tsk->state == TASK_RUNNING) {
> 
> Missing closing brace. Does this build?
> 
> > +		pr_notice("Do not dump other running tasks\n");
> > +		return;
> >  	} else {
> >  		/*
> >  		 * task blocked in __switch_to
> 
> baruch
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-22  5:33       ` Ji.Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-03-22  5:33 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Baruch,

Thank you for your kind and careful examination.
That is my stupid fault due to I use different environments to do
verification and submit patch which I do the merge manually.
Anyway, thanks so much for the reminder, I will send another patch.

BRs,
Ji 

On Thu, 2018-03-22 at 06:57 +0200, Baruch Siach wrote:
> Hi Ji Zhang,
> 
> On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> > When we dump the backtrace of some specific task, there is a potential race
> > condition due to the task may be running on other cores if SMP enabled.
> > That is because for current implementation, if the task is not the current
> > task, we will get the registers used for unwind from cpu_context saved in
> > thread_info, which is the snapshot before context switch, but if the task
> > is running on other cores, the registers and the content of stack are
> > changed.
> > This may cause that we get the wrong backtrace or incomplete backtrace or
> > even crash the kernel.
> > To avoid this case, do not dump the backtrace of the tasks which are
> > running on other cores.
> > This patch cannot solve the issue completely but can shrink the window of
> > race condition.
> > 
> > Signed-off-by: Ji Zhang <ji.zhang@mediatek.com>
> > ---
> >  arch/arm64/kernel/traps.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > index eb2d151..95749364 100644
> > --- a/arch/arm64/kernel/traps.c
> > +++ b/arch/arm64/kernel/traps.c
> > @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> >  	if (tsk == current) {
> >  		frame.fp = (unsigned long)__builtin_frame_address(0);
> >  		frame.pc = (unsigned long)dump_backtrace;
> > +	else if (tsk->state == TASK_RUNNING) {
> 
> Missing closing brace. Does this build?
> 
> > +		pr_notice("Do not dump other running tasks\n");
> > +		return;
> >  	} else {
> >  		/*
> >  		 * task blocked in __switch_to
> 
> baruch
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-03-22  3:06 ` Ji Zhang
@ 2018-03-22  5:59   ` Mark Rutland
  -1 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-03-22  5:59 UTC (permalink / raw)
  To: Ji Zhang
  Cc: Catalin Marinas, Will Deacon, Matthias Brugger, Ard Biesheuvel,
	James Morse, Dave Martin, Marc Zyngier, Michael Weiser,
	Julien Thierry, Xie XiuQi, linux-arm-kernel, linux-kernel,
	linux-mediatek, wsd_upstream, shadanji

On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> When we dump the backtrace of some specific task, there is a potential race
> condition due to the task may be running on other cores if SMP enabled.
> That is because for current implementation, if the task is not the current
> task, we will get the registers used for unwind from cpu_context saved in
> thread_info, which is the snapshot before context switch, but if the task
> is running on other cores, the registers and the content of stack are
> changed.
> This may cause that we get the wrong backtrace or incomplete backtrace or
> even crash the kernel.

When do we call dump_backtrace() on a running task that is not current?

AFAICT, we don't do that in the arm64-specific callers of dump_backtrace(), and
this would have to be some caller of show_stack() in generic code.

We pin the task's stack via try_get_task_stack(), so this cannot be unmapped
while we walk it. In unwind_frame() we check that the frame record falls
entirely within the task's stack. So AFAICT, we cannot crash the kernel here,
though the backtrace may be misleading (and we could potentially get stuck in
an infinite loop).

> To avoid this case, do not dump the backtrace of the tasks which are
> running on other cores.
> This patch cannot solve the issue completely but can shrink the window of
> race condition.

> @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>  	if (tsk == current) {
>  		frame.fp = (unsigned long)__builtin_frame_address(0);
>  		frame.pc = (unsigned long)dump_backtrace;
> +	else if (tsk->state == TASK_RUNNING) {
> +		pr_notice("Do not dump other running tasks\n");
> +		return;

As you note, if we can race with the task being scheduled, this doesn't help.

Can we rule this out at a higher level?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-22  5:59   ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-03-22  5:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> When we dump the backtrace of some specific task, there is a potential race
> condition due to the task may be running on other cores if SMP enabled.
> That is because for current implementation, if the task is not the current
> task, we will get the registers used for unwind from cpu_context saved in
> thread_info, which is the snapshot before context switch, but if the task
> is running on other cores, the registers and the content of stack are
> changed.
> This may cause that we get the wrong backtrace or incomplete backtrace or
> even crash the kernel.

When do we call dump_backtrace() on a running task that is not current?

AFAICT, we don't do that in the arm64-specific callers of dump_backtrace(), and
this would have to be some caller of show_stack() in generic code.

We pin the task's stack via try_get_task_stack(), so this cannot be unmapped
while we walk it. In unwind_frame() we check that the frame record falls
entirely within the task's stack. So AFAICT, we cannot crash the kernel here,
though the backtrace may be misleading (and we could potentially get stuck in
an infinite loop).

> To avoid this case, do not dump the backtrace of the tasks which are
> running on other cores.
> This patch cannot solve the issue completely but can shrink the window of
> race condition.

> @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>  	if (tsk == current) {
>  		frame.fp = (unsigned long)__builtin_frame_address(0);
>  		frame.pc = (unsigned long)dump_backtrace;
> +	else if (tsk->state == TASK_RUNNING) {
> +		pr_notice("Do not dump other running tasks\n");
> +		return;

As you note, if we can race with the task being scheduled, this doesn't help.

Can we rule this out at a higher level?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-03-22  5:59   ` Mark Rutland
@ 2018-03-22  9:35     ` Ji.Zhang
  -1 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-03-22  9:35 UTC (permalink / raw)
  To: Mark Rutland
  Cc: wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, Xie XiuQi, Ard Biesheuvel,
	Marc Zyngier, Catalin Marinas, Julien Thierry, Will Deacon,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shadanji-9Onoh4P/yGk,
	James Morse, Matthias Brugger,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Michael Weiser,
	Dave Martin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, 2018-03-22 at 05:59 +0000, Mark Rutland wrote:
> On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> > When we dump the backtrace of some specific task, there is a potential race
> > condition due to the task may be running on other cores if SMP enabled.
> > That is because for current implementation, if the task is not the current
> > task, we will get the registers used for unwind from cpu_context saved in
> > thread_info, which is the snapshot before context switch, but if the task
> > is running on other cores, the registers and the content of stack are
> > changed.
> > This may cause that we get the wrong backtrace or incomplete backtrace or
> > even crash the kernel.
> 
> When do we call dump_backtrace() on a running task that is not current?
> 
> AFAICT, we don't do that in the arm64-specific callers of dump_backtrace(), and
> this would have to be some caller of show_stack() in generic code.
Yes, show_stack() can make caller specify a task and dump its backtrace.
For example, SysRq-T (echo t > /proc/sysrq-trigger) will use this to
dump the backtrace of specific tasks.
> 
> We pin the task's stack via try_get_task_stack(), so this cannot be unmapped
> while we walk it. In unwind_frame() we check that the frame record falls
> entirely within the task's stack. So AFAICT, we cannot crash the kernel here,
> though the backtrace may be misleading (and we could potentially get stuck in
> an infinite loop).
You are right, I have checked the code and it seems that the check for
fp in unwind_frame() is strong enough to handle the case which stack
being changed due to task running. And as you mentioned, if
unfortunately fp is point to the address of itself, the unwind will be
an infinite loop, but it is a very small probability event, so we can
ignore this, is that right?
> 
> > To avoid this case, do not dump the backtrace of the tasks which are
> > running on other cores.
> > This patch cannot solve the issue completely but can shrink the window of
> > race condition.
> 
> > @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> >  	if (tsk == current) {
> >  		frame.fp = (unsigned long)__builtin_frame_address(0);
> >  		frame.pc = (unsigned long)dump_backtrace;
> > +	else if (tsk->state == TASK_RUNNING) {
> > +		pr_notice("Do not dump other running tasks\n");
> > +		return;
> 
> As you note, if we can race with the task being scheduled, this doesn't help.
> 
> Can we rule this out at a higher level?
> Thanks,
> Mark.
Actually, according to my previous understanding, the low level function
should be transparent to callers and should provide the right result and
handle some unexpected cases, which means that if the result may be
misleading, we should drop it. That is why I bypass all TASK_RUNNING
tasks. I am not sure if this understanding is reasonable for this case.

And as you mentioned that rule this out at a higher level, is there any
suggestions, handle this in the caller of show_stack()?

PS, the dump_backtrace in arm64 is strong enough while arm has a weak
one, which strongly depend on the content of stack(in c_backtrace), and
it indeed can crash the kernel due to stack changed, I have submit a
similar patch for arm (arm: avoid race condition issue in
dump_backtrace)
If we can find a way to handle this at a higher way, should it be
suitable for both arm and arm64?

Thanks for your kindly support.

Best Regards,
Ji

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-22  9:35     ` Ji.Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-03-22  9:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2018-03-22 at 05:59 +0000, Mark Rutland wrote:
> On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> > When we dump the backtrace of some specific task, there is a potential race
> > condition due to the task may be running on other cores if SMP enabled.
> > That is because for current implementation, if the task is not the current
> > task, we will get the registers used for unwind from cpu_context saved in
> > thread_info, which is the snapshot before context switch, but if the task
> > is running on other cores, the registers and the content of stack are
> > changed.
> > This may cause that we get the wrong backtrace or incomplete backtrace or
> > even crash the kernel.
> 
> When do we call dump_backtrace() on a running task that is not current?
> 
> AFAICT, we don't do that in the arm64-specific callers of dump_backtrace(), and
> this would have to be some caller of show_stack() in generic code.
Yes, show_stack() can make caller specify a task and dump its backtrace.
For example, SysRq-T (echo t > /proc/sysrq-trigger) will use this to
dump the backtrace of specific tasks.
> 
> We pin the task's stack via try_get_task_stack(), so this cannot be unmapped
> while we walk it. In unwind_frame() we check that the frame record falls
> entirely within the task's stack. So AFAICT, we cannot crash the kernel here,
> though the backtrace may be misleading (and we could potentially get stuck in
> an infinite loop).
You are right, I have checked the code and it seems that the check for
fp in unwind_frame() is strong enough to handle the case which stack
being changed due to task running. And as you mentioned, if
unfortunately fp is point to the address of itself, the unwind will be
an infinite loop, but it is a very small probability event, so we can
ignore this, is that right?
> 
> > To avoid this case, do not dump the backtrace of the tasks which are
> > running on other cores.
> > This patch cannot solve the issue completely but can shrink the window of
> > race condition.
> 
> > @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> >  	if (tsk == current) {
> >  		frame.fp = (unsigned long)__builtin_frame_address(0);
> >  		frame.pc = (unsigned long)dump_backtrace;
> > +	else if (tsk->state == TASK_RUNNING) {
> > +		pr_notice("Do not dump other running tasks\n");
> > +		return;
> 
> As you note, if we can race with the task being scheduled, this doesn't help.
> 
> Can we rule this out at a higher level?
> Thanks,
> Mark.
Actually, according to my previous understanding, the low level function
should be transparent to callers and should provide the right result and
handle some unexpected cases, which means that if the result may be
misleading, we should drop it. That is why I bypass all TASK_RUNNING
tasks. I am not sure if this understanding is reasonable for this case.

And as you mentioned that rule this out at a higher level, is there any
suggestions, handle this in the caller of show_stack()?

PS, the dump_backtrace in arm64 is strong enough while arm has a weak
one, which strongly depend on the content of stack(in c_backtrace), and
it indeed can crash the kernel due to stack changed, I have submit a
similar patch for arm (arm: avoid race condition issue in
dump_backtrace)
If we can find a way to handle this at a higher way, should it be
suitable for both arm and arm64?

Thanks for your kindly support.

Best Regards,
Ji

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-03-22  3:06 ` Ji Zhang
  (?)
@ 2018-03-24 11:01   ` kbuild test robot
  -1 siblings, 0 replies; 36+ messages in thread
From: kbuild test robot @ 2018-03-24 11:01 UTC (permalink / raw)
  To: Ji Zhang
  Cc: kbuild-all, Catalin Marinas, Will Deacon, Matthias Brugger,
	Mark Rutland, Ard Biesheuvel, James Morse, Dave Martin,
	Marc Zyngier, Michael Weiser, Julien Thierry, Xie XiuQi,
	linux-arm-kernel, linux-kernel, linux-mediatek, wsd_upstream,
	Ji Zhang, shadanji

[-- Attachment #1: Type: text/plain, Size: 2819 bytes --]

Hi Ji,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.16-rc6 next-20180323]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ji-Zhang/arm64-avoid-race-condition-issue-in-dump_backtrace/20180324-165040
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/kernel/traps.c: In function 'dump_backtrace':
>> arch/arm64/kernel/traps.c:116:2: error: expected '}' before 'else'
     else if (tsk->state == TASK_RUNNING) {
     ^~~~

vim +116 arch/arm64/kernel/traps.c

    99	
   100	void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
   101	{
   102		struct stackframe frame;
   103		int skip;
   104	
   105		pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
   106	
   107		if (!tsk)
   108			tsk = current;
   109	
   110		if (!try_get_task_stack(tsk))
   111			return;
   112	
   113		if (tsk == current) {
   114			frame.fp = (unsigned long)__builtin_frame_address(0);
   115			frame.pc = (unsigned long)dump_backtrace;
 > 116		else if (tsk->state == TASK_RUNNING) {
   117			pr_notice("Do not dump other running tasks\n");
   118			return;
   119		} else {
   120			/*
   121			 * task blocked in __switch_to
   122			 */
   123			frame.fp = thread_saved_fp(tsk);
   124			frame.pc = thread_saved_pc(tsk);
   125		}
   126	#ifdef CONFIG_FUNCTION_GRAPH_TRACER
   127		frame.graph = tsk->curr_ret_stack;
   128	#endif
   129	
   130		skip = !!regs;
   131		printk("Call trace:\n");
   132		do {
   133			/* skip until specified stack frame */
   134			if (!skip) {
   135				dump_backtrace_entry(frame.pc);
   136			} else if (frame.fp == regs->regs[29]) {
   137				skip = 0;
   138				/*
   139				 * Mostly, this is the case where this function is
   140				 * called in panic/abort. As exception handler's
   141				 * stack frame does not contain the corresponding pc
   142				 * at which an exception has taken place, use regs->pc
   143				 * instead.
   144				 */
   145				dump_backtrace_entry(regs->pc);
   146			}
   147		} while (!unwind_frame(tsk, &frame));
   148	
   149		put_task_stack(tsk);
   150	}
   151	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37896 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-24 11:01   ` kbuild test robot
  0 siblings, 0 replies; 36+ messages in thread
From: kbuild test robot @ 2018-03-24 11:01 UTC (permalink / raw)
  Cc: kbuild-all, Catalin Marinas, Will Deacon, Matthias Brugger,
	Mark Rutland, Ard Biesheuvel, James Morse, Dave Martin,
	Marc Zyngier, Michael Weiser, Julien Thierry, Xie XiuQi,
	linux-arm-kernel, linux-kernel, linux-mediatek, wsd_upstream,
	Ji Zhang, shadanji

[-- Attachment #1: Type: text/plain, Size: 2819 bytes --]

Hi Ji,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.16-rc6 next-20180323]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ji-Zhang/arm64-avoid-race-condition-issue-in-dump_backtrace/20180324-165040
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/kernel/traps.c: In function 'dump_backtrace':
>> arch/arm64/kernel/traps.c:116:2: error: expected '}' before 'else'
     else if (tsk->state == TASK_RUNNING) {
     ^~~~

vim +116 arch/arm64/kernel/traps.c

    99	
   100	void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
   101	{
   102		struct stackframe frame;
   103		int skip;
   104	
   105		pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
   106	
   107		if (!tsk)
   108			tsk = current;
   109	
   110		if (!try_get_task_stack(tsk))
   111			return;
   112	
   113		if (tsk == current) {
   114			frame.fp = (unsigned long)__builtin_frame_address(0);
   115			frame.pc = (unsigned long)dump_backtrace;
 > 116		else if (tsk->state == TASK_RUNNING) {
   117			pr_notice("Do not dump other running tasks\n");
   118			return;
   119		} else {
   120			/*
   121			 * task blocked in __switch_to
   122			 */
   123			frame.fp = thread_saved_fp(tsk);
   124			frame.pc = thread_saved_pc(tsk);
   125		}
   126	#ifdef CONFIG_FUNCTION_GRAPH_TRACER
   127		frame.graph = tsk->curr_ret_stack;
   128	#endif
   129	
   130		skip = !!regs;
   131		printk("Call trace:\n");
   132		do {
   133			/* skip until specified stack frame */
   134			if (!skip) {
   135				dump_backtrace_entry(frame.pc);
   136			} else if (frame.fp == regs->regs[29]) {
   137				skip = 0;
   138				/*
   139				 * Mostly, this is the case where this function is
   140				 * called in panic/abort. As exception handler's
   141				 * stack frame does not contain the corresponding pc
   142				 * at which an exception has taken place, use regs->pc
   143				 * instead.
   144				 */
   145				dump_backtrace_entry(regs->pc);
   146			}
   147		} while (!unwind_frame(tsk, &frame));
   148	
   149		put_task_stack(tsk);
   150	}
   151	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37896 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-24 11:01   ` kbuild test robot
  0 siblings, 0 replies; 36+ messages in thread
From: kbuild test robot @ 2018-03-24 11:01 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ji,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.16-rc6 next-20180323]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ji-Zhang/arm64-avoid-race-condition-issue-in-dump_backtrace/20180324-165040
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/kernel/traps.c: In function 'dump_backtrace':
>> arch/arm64/kernel/traps.c:116:2: error: expected '}' before 'else'
     else if (tsk->state == TASK_RUNNING) {
     ^~~~

vim +116 arch/arm64/kernel/traps.c

    99	
   100	void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
   101	{
   102		struct stackframe frame;
   103		int skip;
   104	
   105		pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
   106	
   107		if (!tsk)
   108			tsk = current;
   109	
   110		if (!try_get_task_stack(tsk))
   111			return;
   112	
   113		if (tsk == current) {
   114			frame.fp = (unsigned long)__builtin_frame_address(0);
   115			frame.pc = (unsigned long)dump_backtrace;
 > 116		else if (tsk->state == TASK_RUNNING) {
   117			pr_notice("Do not dump other running tasks\n");
   118			return;
   119		} else {
   120			/*
   121			 * task blocked in __switch_to
   122			 */
   123			frame.fp = thread_saved_fp(tsk);
   124			frame.pc = thread_saved_pc(tsk);
   125		}
   126	#ifdef CONFIG_FUNCTION_GRAPH_TRACER
   127		frame.graph = tsk->curr_ret_stack;
   128	#endif
   129	
   130		skip = !!regs;
   131		printk("Call trace:\n");
   132		do {
   133			/* skip until specified stack frame */
   134			if (!skip) {
   135				dump_backtrace_entry(frame.pc);
   136			} else if (frame.fp == regs->regs[29]) {
   137				skip = 0;
   138				/*
   139				 * Mostly, this is the case where this function is
   140				 * called in panic/abort. As exception handler's
   141				 * stack frame does not contain the corresponding pc
   142				 * at which an exception has taken place, use regs->pc
   143				 * instead.
   144				 */
   145				dump_backtrace_entry(regs->pc);
   146			}
   147		} while (!unwind_frame(tsk, &frame));
   148	
   149		put_task_stack(tsk);
   150	}
   151	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 37896 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180324/dc233aa6/attachment-0001.gz>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-03-22  9:35     ` Ji.Zhang
@ 2018-03-26 11:39       ` Mark Rutland
  -1 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-03-26 11:39 UTC (permalink / raw)
  To: Ji.Zhang
  Cc: Catalin Marinas, Will Deacon, Matthias Brugger, Ard Biesheuvel,
	James Morse, Dave Martin, Marc Zyngier, Michael Weiser,
	Julien Thierry, Xie XiuQi, linux-arm-kernel, linux-kernel,
	linux-mediatek, wsd_upstream, shadanji

On Thu, Mar 22, 2018 at 05:35:29PM +0800, Ji.Zhang wrote:
> On Thu, 2018-03-22 at 05:59 +0000, Mark Rutland wrote:
> > On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> > > When we dump the backtrace of some specific task, there is a potential race
> > > condition due to the task may be running on other cores if SMP enabled.
> > > That is because for current implementation, if the task is not the current
> > > task, we will get the registers used for unwind from cpu_context saved in
> > > thread_info, which is the snapshot before context switch, but if the task
> > > is running on other cores, the registers and the content of stack are
> > > changed.
> > > This may cause that we get the wrong backtrace or incomplete backtrace or
> > > even crash the kernel.
> > 
> > When do we call dump_backtrace() on a running task that is not current?
> > 
> > AFAICT, we don't do that in the arm64-specific callers of dump_backtrace(), and
> > this would have to be some caller of show_stack() in generic code.
> Yes, show_stack() can make caller specify a task and dump its backtrace.
> For example, SysRq-T (echo t > /proc/sysrq-trigger) will use this to
> dump the backtrace of specific tasks.

Ok. I see that this eventually calls show_state_filter(0), where we call
sched_show_task() for every task.

> > We pin the task's stack via try_get_task_stack(), so this cannot be unmapped
> > while we walk it. In unwind_frame() we check that the frame record falls
> > entirely within the task's stack. So AFAICT, we cannot crash the kernel here,
> > though the backtrace may be misleading (and we could potentially get stuck in
> > an infinite loop).
> You are right, I have checked the code and it seems that the check for
> fp in unwind_frame() is strong enough to handle the case which stack
> being changed due to task running. And as you mentioned, if
> unfortunately fp is point to the address of itself, the unwind will be
> an infinite loop, but it is a very small probability event, so we can
> ignore this, is that right?

I think that it would be preferable to try to avoid the inifinite loop
case. We could hit that by accident if we're tracing a live task.

It's a little tricky to ensure that we don't loop, since we can have
traces that span several stacks, e.g. overflow -> irq -> task, so we
need to know where the last frame was, and we need to defnie a strict
order for stack nesting.

> > > To avoid this case, do not dump the backtrace of the tasks which are
> > > running on other cores.
> > > This patch cannot solve the issue completely but can shrink the window of
> > > race condition.
> > 
> > > @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> > >  	if (tsk == current) {
> > >  		frame.fp = (unsigned long)__builtin_frame_address(0);
> > >  		frame.pc = (unsigned long)dump_backtrace;
> > > +	else if (tsk->state == TASK_RUNNING) {
> > > +		pr_notice("Do not dump other running tasks\n");
> > > +		return;
> > 
> > As you note, if we can race with the task being scheduled, this doesn't help.
> > 
> > Can we rule this out at a higher level?
> > Thanks,
> > Mark.
> Actually, according to my previous understanding, the low level function
> should be transparent to callers and should provide the right result and
> handle some unexpected cases, which means that if the result may be
> misleading, we should drop it. That is why I bypass all TASK_RUNNING
> tasks. I am not sure if this understanding is reasonable for this case.

Given that this can change under our feet, I think this only provides a
false sense of security and complicates the code.

> And as you mentioned that rule this out at a higher level, is there any
> suggestions, handle this in the caller of show_stack()?

Unfortunately, it doesn't look like we can do this in general given
cases like sysrq-t.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-26 11:39       ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-03-26 11:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 22, 2018 at 05:35:29PM +0800, Ji.Zhang wrote:
> On Thu, 2018-03-22 at 05:59 +0000, Mark Rutland wrote:
> > On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> > > When we dump the backtrace of some specific task, there is a potential race
> > > condition due to the task may be running on other cores if SMP enabled.
> > > That is because for current implementation, if the task is not the current
> > > task, we will get the registers used for unwind from cpu_context saved in
> > > thread_info, which is the snapshot before context switch, but if the task
> > > is running on other cores, the registers and the content of stack are
> > > changed.
> > > This may cause that we get the wrong backtrace or incomplete backtrace or
> > > even crash the kernel.
> > 
> > When do we call dump_backtrace() on a running task that is not current?
> > 
> > AFAICT, we don't do that in the arm64-specific callers of dump_backtrace(), and
> > this would have to be some caller of show_stack() in generic code.
> Yes, show_stack() can make caller specify a task and dump its backtrace.
> For example, SysRq-T (echo t > /proc/sysrq-trigger) will use this to
> dump the backtrace of specific tasks.

Ok. I see that this eventually calls show_state_filter(0), where we call
sched_show_task() for every task.

> > We pin the task's stack via try_get_task_stack(), so this cannot be unmapped
> > while we walk it. In unwind_frame() we check that the frame record falls
> > entirely within the task's stack. So AFAICT, we cannot crash the kernel here,
> > though the backtrace may be misleading (and we could potentially get stuck in
> > an infinite loop).
> You are right, I have checked the code and it seems that the check for
> fp in unwind_frame() is strong enough to handle the case which stack
> being changed due to task running. And as you mentioned, if
> unfortunately fp is point to the address of itself, the unwind will be
> an infinite loop, but it is a very small probability event, so we can
> ignore this, is that right?

I think that it would be preferable to try to avoid the inifinite loop
case. We could hit that by accident if we're tracing a live task.

It's a little tricky to ensure that we don't loop, since we can have
traces that span several stacks, e.g. overflow -> irq -> task, so we
need to know where the last frame was, and we need to defnie a strict
order for stack nesting.

> > > To avoid this case, do not dump the backtrace of the tasks which are
> > > running on other cores.
> > > This patch cannot solve the issue completely but can shrink the window of
> > > race condition.
> > 
> > > @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> > >  	if (tsk == current) {
> > >  		frame.fp = (unsigned long)__builtin_frame_address(0);
> > >  		frame.pc = (unsigned long)dump_backtrace;
> > > +	else if (tsk->state == TASK_RUNNING) {
> > > +		pr_notice("Do not dump other running tasks\n");
> > > +		return;
> > 
> > As you note, if we can race with the task being scheduled, this doesn't help.
> > 
> > Can we rule this out at a higher level?
> > Thanks,
> > Mark.
> Actually, according to my previous understanding, the low level function
> should be transparent to callers and should provide the right result and
> handle some unexpected cases, which means that if the result may be
> misleading, we should drop it. That is why I bypass all TASK_RUNNING
> tasks. I am not sure if this understanding is reasonable for this case.

Given that this can change under our feet, I think this only provides a
false sense of security and complicates the code.

> And as you mentioned that rule this out at a higher level, is there any
> suggestions, handle this in the caller of show_stack()?

Unfortunately, it doesn't look like we can do this in general given
cases like sysrq-t.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-03-26 11:39       ` Mark Rutland
@ 2018-03-28  9:33           ` Ji.Zhang
  -1 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-03-28  9:33 UTC (permalink / raw)
  To: Mark Rutland
  Cc: wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, Xie XiuQi, Ard Biesheuvel,
	Marc Zyngier, Catalin Marinas, Julien Thierry, Will Deacon,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shadanji-9Onoh4P/yGk,
	James Morse, Matthias Brugger,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Michael Weiser,
	Dave Martin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, 2018-03-26 at 12:39 +0100, Mark Rutland wrote:
> On Thu, Mar 22, 2018 at 05:35:29PM +0800, Ji.Zhang wrote:
> > On Thu, 2018-03-22 at 05:59 +0000, Mark Rutland wrote:
> > > On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> > > > When we dump the backtrace of some specific task, there is a potential race
> > > > condition due to the task may be running on other cores if SMP enabled.
> > > > That is because for current implementation, if the task is not the current
> > > > task, we will get the registers used for unwind from cpu_context saved in
> > > > thread_info, which is the snapshot before context switch, but if the task
> > > > is running on other cores, the registers and the content of stack are
> > > > changed.
> > > > This may cause that we get the wrong backtrace or incomplete backtrace or
> > > > even crash the kernel.
> > > 
> > > When do we call dump_backtrace() on a running task that is not current?
> > > 
> > > AFAICT, we don't do that in the arm64-specific callers of dump_backtrace(), and
> > > this would have to be some caller of show_stack() in generic code.
> > Yes, show_stack() can make caller specify a task and dump its backtrace.
> > For example, SysRq-T (echo t > /proc/sysrq-trigger) will use this to
> > dump the backtrace of specific tasks.
> 
> Ok. I see that this eventually calls show_state_filter(0), where we call
> sched_show_task() for every task.
> 
> > > We pin the task's stack via try_get_task_stack(), so this cannot be unmapped
> > > while we walk it. In unwind_frame() we check that the frame record falls
> > > entirely within the task's stack. So AFAICT, we cannot crash the kernel here,
> > > though the backtrace may be misleading (and we could potentially get stuck in
> > > an infinite loop).
> > You are right, I have checked the code and it seems that the check for
> > fp in unwind_frame() is strong enough to handle the case which stack
> > being changed due to task running. And as you mentioned, if
> > unfortunately fp is point to the address of itself, the unwind will be
> > an infinite loop, but it is a very small probability event, so we can
> > ignore this, is that right?
> 
> I think that it would be preferable to try to avoid the inifinite loop
> case. We could hit that by accident if we're tracing a live task.
> 
> It's a little tricky to ensure that we don't loop, since we can have
> traces that span several stacks, e.g. overflow -> irq -> task, so we
> need to know where the last frame was, and we need to defnie a strict
> order for stack nesting.
Can we consider this through an easier way? According to AArch64 PCS,
stack should be full-descending, which means we can add validation on fp
by comparing the fp and previous fp, if they are equal means there is an
exactly loop, while if current fp is smaller than previous means the
uwnind is rollback, which is also unexpected. The only concern is how to
handle the unwind from one stack span to another (eg. overflow->irq, or
irq->task, etc)
Below diff is a proposal that we check if stack spans, and if yes, a
tricky is used to bypass the fp check.

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index eb2d151..760ea59 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -101,6 +101,7 @@ void dump_backtrace(struct pt_regs *regs, struct
task_struct *tsk)
 {
        struct stackframe frame;
        int skip;
+       unsigned long fp = 0x0;

        pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);

@@ -127,6 +128,20 @@ void dump_backtrace(struct pt_regs *regs, struct
task_struct *tsk)
        skip = !!regs;
        printk("Call trace:\n");
        do {
+               unsigned long stack;
+               if (fp) {
+                       if (in_entry_text(frame.pc)) {
+                               stack = frame.fp - offsetof(struct
pt_regs, stackframe);
+
+                               if (on_accessible_stack(tsk, stack))
+                                       fp = frame.fp + 0x8; //tricky to
bypass the fp check
+                       }
+                       if (fp <= frame->fp) {
+                               pr_notice("fp invalid, stop unwind\n");
+                               break;
+                       }
+               }
+               fp = frame.fp;
                /* skip until specified stack frame */
                if (!skip) {
                        dump_backtrace_entry(frame.pc);
> > > > To avoid this case, do not dump the backtrace of the tasks which are
> > > > running on other cores.
> > > > This patch cannot solve the issue completely but can shrink the window of
> > > > race condition.
> > > 
> > > > @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> > > >  	if (tsk == current) {
> > > >  		frame.fp = (unsigned long)__builtin_frame_address(0);
> > > >  		frame.pc = (unsigned long)dump_backtrace;
> > > > +	else if (tsk->state == TASK_RUNNING) {
> > > > +		pr_notice("Do not dump other running tasks\n");
> > > > +		return;
> > > 
> > > As you note, if we can race with the task being scheduled, this doesn't help.
> > > 
> > > Can we rule this out at a higher level?
> > > Thanks,
> > > Mark.
> > Actually, according to my previous understanding, the low level function
> > should be transparent to callers and should provide the right result and
> > handle some unexpected cases, which means that if the result may be
> > misleading, we should drop it. That is why I bypass all TASK_RUNNING
> > tasks. I am not sure if this understanding is reasonable for this case.
> 
> Given that this can change under our feet, I think this only provides a
> false sense of security and complicates the code.
> 
Agreed
> > And as you mentioned that rule this out at a higher level, is there any
> > suggestions, handle this in the caller of show_stack()?
> 
> Unfortunately, it doesn't look like we can do this in general given
> cases like sysrq-t.
> 
> Thanks,
> Mark.
Best Regards,
Ji

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-28  9:33           ` Ji.Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-03-28  9:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2018-03-26 at 12:39 +0100, Mark Rutland wrote:
> On Thu, Mar 22, 2018 at 05:35:29PM +0800, Ji.Zhang wrote:
> > On Thu, 2018-03-22 at 05:59 +0000, Mark Rutland wrote:
> > > On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote:
> > > > When we dump the backtrace of some specific task, there is a potential race
> > > > condition due to the task may be running on other cores if SMP enabled.
> > > > That is because for current implementation, if the task is not the current
> > > > task, we will get the registers used for unwind from cpu_context saved in
> > > > thread_info, which is the snapshot before context switch, but if the task
> > > > is running on other cores, the registers and the content of stack are
> > > > changed.
> > > > This may cause that we get the wrong backtrace or incomplete backtrace or
> > > > even crash the kernel.
> > > 
> > > When do we call dump_backtrace() on a running task that is not current?
> > > 
> > > AFAICT, we don't do that in the arm64-specific callers of dump_backtrace(), and
> > > this would have to be some caller of show_stack() in generic code.
> > Yes, show_stack() can make caller specify a task and dump its backtrace.
> > For example, SysRq-T (echo t > /proc/sysrq-trigger) will use this to
> > dump the backtrace of specific tasks.
> 
> Ok. I see that this eventually calls show_state_filter(0), where we call
> sched_show_task() for every task.
> 
> > > We pin the task's stack via try_get_task_stack(), so this cannot be unmapped
> > > while we walk it. In unwind_frame() we check that the frame record falls
> > > entirely within the task's stack. So AFAICT, we cannot crash the kernel here,
> > > though the backtrace may be misleading (and we could potentially get stuck in
> > > an infinite loop).
> > You are right, I have checked the code and it seems that the check for
> > fp in unwind_frame() is strong enough to handle the case which stack
> > being changed due to task running. And as you mentioned, if
> > unfortunately fp is point to the address of itself, the unwind will be
> > an infinite loop, but it is a very small probability event, so we can
> > ignore this, is that right?
> 
> I think that it would be preferable to try to avoid the inifinite loop
> case. We could hit that by accident if we're tracing a live task.
> 
> It's a little tricky to ensure that we don't loop, since we can have
> traces that span several stacks, e.g. overflow -> irq -> task, so we
> need to know where the last frame was, and we need to defnie a strict
> order for stack nesting.
Can we consider this through an easier way? According to AArch64 PCS,
stack should be full-descending, which means we can add validation on fp
by comparing the fp and previous fp, if they are equal means there is an
exactly loop, while if current fp is smaller than previous means the
uwnind is rollback, which is also unexpected. The only concern is how to
handle the unwind from one stack span to another (eg. overflow->irq, or
irq->task, etc)
Below diff is a proposal that we check if stack spans, and if yes, a
tricky is used to bypass the fp check.

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index eb2d151..760ea59 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -101,6 +101,7 @@ void dump_backtrace(struct pt_regs *regs, struct
task_struct *tsk)
 {
        struct stackframe frame;
        int skip;
+       unsigned long fp = 0x0;

        pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);

@@ -127,6 +128,20 @@ void dump_backtrace(struct pt_regs *regs, struct
task_struct *tsk)
        skip = !!regs;
        printk("Call trace:\n");
        do {
+               unsigned long stack;
+               if (fp) {
+                       if (in_entry_text(frame.pc)) {
+                               stack = frame.fp - offsetof(struct
pt_regs, stackframe);
+
+                               if (on_accessible_stack(tsk, stack))
+                                       fp = frame.fp + 0x8; //tricky to
bypass the fp check
+                       }
+                       if (fp <= frame->fp) {
+                               pr_notice("fp invalid, stop unwind\n");
+                               break;
+                       }
+               }
+               fp = frame.fp;
                /* skip until specified stack frame */
                if (!skip) {
                        dump_backtrace_entry(frame.pc);
> > > > To avoid this case, do not dump the backtrace of the tasks which are
> > > > running on other cores.
> > > > This patch cannot solve the issue completely but can shrink the window of
> > > > race condition.
> > > 
> > > > @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> > > >  	if (tsk == current) {
> > > >  		frame.fp = (unsigned long)__builtin_frame_address(0);
> > > >  		frame.pc = (unsigned long)dump_backtrace;
> > > > +	else if (tsk->state == TASK_RUNNING) {
> > > > +		pr_notice("Do not dump other running tasks\n");
> > > > +		return;
> > > 
> > > As you note, if we can race with the task being scheduled, this doesn't help.
> > > 
> > > Can we rule this out at a higher level?
> > > Thanks,
> > > Mark.
> > Actually, according to my previous understanding, the low level function
> > should be transparent to callers and should provide the right result and
> > handle some unexpected cases, which means that if the result may be
> > misleading, we should drop it. That is why I bypass all TASK_RUNNING
> > tasks. I am not sure if this understanding is reasonable for this case.
> 
> Given that this can change under our feet, I think this only provides a
> false sense of security and complicates the code.
> 
Agreed
> > And as you mentioned that rule this out at a higher level, is there any
> > suggestions, handle this in the caller of show_stack()?
> 
> Unfortunately, it doesn't look like we can do this in general given
> cases like sysrq-t.
> 
> Thanks,
> Mark.
Best Regards,
Ji

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-03-28  9:33           ` Ji.Zhang
@ 2018-03-28 10:12             ` Mark Rutland
  -1 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-03-28 10:12 UTC (permalink / raw)
  To: Ji.Zhang
  Cc: Catalin Marinas, Will Deacon, Matthias Brugger, Ard Biesheuvel,
	James Morse, Dave Martin, Marc Zyngier, Michael Weiser,
	Julien Thierry, Xie XiuQi, linux-arm-kernel, linux-kernel,
	linux-mediatek, wsd_upstream, shadanji

On Wed, Mar 28, 2018 at 05:33:32PM +0800, Ji.Zhang wrote:
> On Mon, 2018-03-26 at 12:39 +0100, Mark Rutland wrote:
> > I think that it would be preferable to try to avoid the inifinite loop
> > case. We could hit that by accident if we're tracing a live task.
> > 
> > It's a little tricky to ensure that we don't loop, since we can have
> > traces that span several stacks, e.g. overflow -> irq -> task, so we
> > need to know where the last frame was, and we need to defnie a strict
> > order for stack nesting.
> Can we consider this through an easier way? According to AArch64 PCS,
> stack should be full-descending, which means we can add validation on fp
> by comparing the fp and previous fp, if they are equal means there is an
> exactly loop, while if current fp is smaller than previous means the
> uwnind is rollback, which is also unexpected. The only concern is how to
> handle the unwind from one stack span to another (eg. overflow->irq, or
> irq->task, etc)
> Below diff is a proposal that we check if stack spans, and if yes, a
> tricky is used to bypass the fp check.
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index eb2d151..760ea59 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -101,6 +101,7 @@ void dump_backtrace(struct pt_regs *regs, struct
> task_struct *tsk)
>  {
>         struct stackframe frame;
>         int skip;
> +       unsigned long fp = 0x0;
> 
>         pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
> 
> @@ -127,6 +128,20 @@ void dump_backtrace(struct pt_regs *regs, struct
> task_struct *tsk)
>         skip = !!regs;
>         printk("Call trace:\n");
>         do {
> +               unsigned long stack;
> +               if (fp) {
> +                       if (in_entry_text(frame.pc)) {
> +                               stack = frame.fp - offsetof(struct
> pt_regs, stackframe);
> +
> +                               if (on_accessible_stack(tsk, stack))
> +                                       fp = frame.fp + 0x8; //tricky to
> bypass the fp check
> +                       }
> +                       if (fp <= frame->fp) {
> +                               pr_notice("fp invalid, stop unwind\n");
> +                               break;
> +                       }
> +               }
> +               fp = frame.fp;

I'm very much not keen on this.

I think that if we're going to do this, the only sane way to do it is to
have unwind_frame() verify the current fp against the previous one, and
verify that we have some strict nesting of stacks. Generally, that means
we can go:

  overflow -> irq -> task

... though I'm not sure what to do about the SDEI stack vs the overflow
stack.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-28 10:12             ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-03-28 10:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 28, 2018 at 05:33:32PM +0800, Ji.Zhang wrote:
> On Mon, 2018-03-26 at 12:39 +0100, Mark Rutland wrote:
> > I think that it would be preferable to try to avoid the inifinite loop
> > case. We could hit that by accident if we're tracing a live task.
> > 
> > It's a little tricky to ensure that we don't loop, since we can have
> > traces that span several stacks, e.g. overflow -> irq -> task, so we
> > need to know where the last frame was, and we need to defnie a strict
> > order for stack nesting.
> Can we consider this through an easier way? According to AArch64 PCS,
> stack should be full-descending, which means we can add validation on fp
> by comparing the fp and previous fp, if they are equal means there is an
> exactly loop, while if current fp is smaller than previous means the
> uwnind is rollback, which is also unexpected. The only concern is how to
> handle the unwind from one stack span to another (eg. overflow->irq, or
> irq->task, etc)
> Below diff is a proposal that we check if stack spans, and if yes, a
> tricky is used to bypass the fp check.
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index eb2d151..760ea59 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -101,6 +101,7 @@ void dump_backtrace(struct pt_regs *regs, struct
> task_struct *tsk)
>  {
>         struct stackframe frame;
>         int skip;
> +       unsigned long fp = 0x0;
> 
>         pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
> 
> @@ -127,6 +128,20 @@ void dump_backtrace(struct pt_regs *regs, struct
> task_struct *tsk)
>         skip = !!regs;
>         printk("Call trace:\n");
>         do {
> +               unsigned long stack;
> +               if (fp) {
> +                       if (in_entry_text(frame.pc)) {
> +                               stack = frame.fp - offsetof(struct
> pt_regs, stackframe);
> +
> +                               if (on_accessible_stack(tsk, stack))
> +                                       fp = frame.fp + 0x8; //tricky to
> bypass the fp check
> +                       }
> +                       if (fp <= frame->fp) {
> +                               pr_notice("fp invalid, stop unwind\n");
> +                               break;
> +                       }
> +               }
> +               fp = frame.fp;

I'm very much not keen on this.

I think that if we're going to do this, the only sane way to do it is to
have unwind_frame() verify the current fp against the previous one, and
verify that we have some strict nesting of stacks. Generally, that means
we can go:

  overflow -> irq -> task

... though I'm not sure what to do about the SDEI stack vs the overflow
stack.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-03-28 10:12             ` Mark Rutland
@ 2018-03-30  8:08                 ` Ji.Zhang
  -1 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-03-30  8:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, Xie XiuQi, Ard Biesheuvel,
	Marc Zyngier, Catalin Marinas, Julien Thierry, Will Deacon,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shadanji-9Onoh4P/yGk,
	James Morse, Matthias Brugger,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Michael Weiser,
	Dave Martin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, 2018-03-28 at 11:12 +0100, Mark Rutland wrote:
> On Wed, Mar 28, 2018 at 05:33:32PM +0800, Ji.Zhang wrote:
> 
> I'm very much not keen on this.
> 
> I think that if we're going to do this, the only sane way to do it is to
> have unwind_frame() verify the current fp against the previous one, and
> verify that we have some strict nesting of stacks. Generally, that means
> we can go:
> 
>   overflow -> irq -> task
> 
> ... though I'm not sure what to do about the SDEI stack vs the overflow
> stack.
Actually I have had the fp check in unwind_frame(), but since I use the
in_entry_text() to determine if stack spans, and I did not want to
include traps.h in stacktrace.c, so I move the check out to
dump_backtrace.
Anyway, It seems that the key point is how should we verify that there
are some nesting of stacks. Since in unwind_frame() we already have the
previous fp and current fp, could we assume that if these two fps are
NOT belong to the same stack, there should be stack spans (no matter
task->irq, or irq->overflow, etc), and we can do the tricky to bypass
the fp check.The sample of the prosal just like:

diff --git a/arch/arm64/include/asm/stacktrace.h
b/arch/arm64/include/asm/stacktrace.h
index 902f9ed..fc2bf4d 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -92,4 +92,18 @@ static inline bool on_accessible_stack(struct
task_struct *tsk, unsigned long sp
        return false;
 }

+static inline bool on_same_stack(struct task_struct *tsk, unsigned long
sp1, unsigned sp2)
+{
+       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
+               return true;
+       if (on_irq_stack(sp1) && on_irq_stack(sp2))
+               return true;
+       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
+               return true;
+       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
+               return true;
+
+       return false;
+}
+
 #endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c
b/arch/arm64/kernel/stacktrace.c
index d5718a0..4907a67 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -56,6 +56,13 @@ int notrace unwind_frame(struct task_struct *tsk,
struct stackframe *frame)
        frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
        frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));

+       if (!on_same_stack(fp, frame->fp))
+               fp = frame->fp + 0x8;
+       if (fp <= frame->fp) {
+               pr_notice("fp invalid, stop unwind\n");
+               return -EINVAL;
+       }
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
        if (tsk->ret_stack &&
                        (frame->pc == (unsigned long)return_to_handler))
{

Could this work?

Best Regards,
Ji

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-03-30  8:08                 ` Ji.Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-03-30  8:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2018-03-28 at 11:12 +0100, Mark Rutland wrote:
> On Wed, Mar 28, 2018 at 05:33:32PM +0800, Ji.Zhang wrote:
> 
> I'm very much not keen on this.
> 
> I think that if we're going to do this, the only sane way to do it is to
> have unwind_frame() verify the current fp against the previous one, and
> verify that we have some strict nesting of stacks. Generally, that means
> we can go:
> 
>   overflow -> irq -> task
> 
> ... though I'm not sure what to do about the SDEI stack vs the overflow
> stack.
Actually I have had the fp check in unwind_frame(), but since I use the
in_entry_text() to determine if stack spans, and I did not want to
include traps.h in stacktrace.c, so I move the check out to
dump_backtrace.
Anyway, It seems that the key point is how should we verify that there
are some nesting of stacks. Since in unwind_frame() we already have the
previous fp and current fp, could we assume that if these two fps are
NOT belong to the same stack, there should be stack spans (no matter
task->irq, or irq->overflow, etc), and we can do the tricky to bypass
the fp check.The sample of the prosal just like:

diff --git a/arch/arm64/include/asm/stacktrace.h
b/arch/arm64/include/asm/stacktrace.h
index 902f9ed..fc2bf4d 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -92,4 +92,18 @@ static inline bool on_accessible_stack(struct
task_struct *tsk, unsigned long sp
        return false;
 }

+static inline bool on_same_stack(struct task_struct *tsk, unsigned long
sp1, unsigned sp2)
+{
+       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
+               return true;
+       if (on_irq_stack(sp1) && on_irq_stack(sp2))
+               return true;
+       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
+               return true;
+       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
+               return true;
+
+       return false;
+}
+
 #endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c
b/arch/arm64/kernel/stacktrace.c
index d5718a0..4907a67 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -56,6 +56,13 @@ int notrace unwind_frame(struct task_struct *tsk,
struct stackframe *frame)
        frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
        frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));

+       if (!on_same_stack(fp, frame->fp))
+               fp = frame->fp + 0x8;
+       if (fp <= frame->fp) {
+               pr_notice("fp invalid, stop unwind\n");
+               return -EINVAL;
+       }
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
        if (tsk->ret_stack &&
                        (frame->pc == (unsigned long)return_to_handler))
{

Could this work?

Best Regards,
Ji

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-03-30  8:08                 ` Ji.Zhang
@ 2018-04-04  9:04                   ` Mark Rutland
  -1 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-04-04  9:04 UTC (permalink / raw)
  To: Ji.Zhang
  Cc: Catalin Marinas, Will Deacon, Matthias Brugger, Ard Biesheuvel,
	James Morse, Dave Martin, Marc Zyngier, Michael Weiser,
	Julien Thierry, Xie XiuQi, linux-arm-kernel, linux-kernel,
	linux-mediatek, wsd_upstream, shadanji

On Fri, Mar 30, 2018 at 04:08:12PM +0800, Ji.Zhang wrote:
> On Wed, 2018-03-28 at 11:12 +0100, Mark Rutland wrote:
> > On Wed, Mar 28, 2018 at 05:33:32PM +0800, Ji.Zhang wrote:
> > 
> > I'm very much not keen on this.
> > 
> > I think that if we're going to do this, the only sane way to do it is to
> > have unwind_frame() verify the current fp against the previous one, and
> > verify that we have some strict nesting of stacks. Generally, that means
> > we can go:
> > 
> >   overflow -> irq -> task
> > 
> > ... though I'm not sure what to do about the SDEI stack vs the overflow
> > stack.
> Actually I have had the fp check in unwind_frame(), but since I use the
> in_entry_text() to determine if stack spans, and I did not want to
> include traps.h in stacktrace.c, so I move the check out to
> dump_backtrace.
> Anyway, It seems that the key point is how should we verify that there
> are some nesting of stacks. Since in unwind_frame() we already have the
> previous fp and current fp, could we assume that if these two fps are
> NOT belong to the same stack, there should be stack spans (no matter
> task->irq, or irq->overflow, etc), and we can do the tricky to bypass
> the fp check.The sample of the prosal just like:

Unfortuantely, this still allows for loops, like task -> irq -> task, so
I think if we're going to try to fix this, we must define a nesting
order and check against that.

Thanks,
Mark.

> diff --git a/arch/arm64/include/asm/stacktrace.h
> b/arch/arm64/include/asm/stacktrace.h
> index 902f9ed..fc2bf4d 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -92,4 +92,18 @@ static inline bool on_accessible_stack(struct
> task_struct *tsk, unsigned long sp
>         return false;
>  }
> 
> +static inline bool on_same_stack(struct task_struct *tsk, unsigned long
> sp1, unsigned sp2)
> +{
> +       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
> +               return true;
> +       if (on_irq_stack(sp1) && on_irq_stack(sp2))
> +               return true;
> +       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
> +               return true;
> +       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
> +               return true;
> +
> +       return false;
> +}
> +
>  #endif /* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/kernel/stacktrace.c
> b/arch/arm64/kernel/stacktrace.c
> index d5718a0..4907a67 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -56,6 +56,13 @@ int notrace unwind_frame(struct task_struct *tsk,
> struct stackframe *frame)
>         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> 
> +       if (!on_same_stack(fp, frame->fp))
> +               fp = frame->fp + 0x8;
> +       if (fp <= frame->fp) {
> +               pr_notice("fp invalid, stop unwind\n");
> +               return -EINVAL;
> +       }
> +
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         if (tsk->ret_stack &&
>                         (frame->pc == (unsigned long)return_to_handler))
> {
> 
> Could this work?
> 
> Best Regards,
> Ji
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-04-04  9:04                   ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-04-04  9:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 30, 2018 at 04:08:12PM +0800, Ji.Zhang wrote:
> On Wed, 2018-03-28 at 11:12 +0100, Mark Rutland wrote:
> > On Wed, Mar 28, 2018 at 05:33:32PM +0800, Ji.Zhang wrote:
> > 
> > I'm very much not keen on this.
> > 
> > I think that if we're going to do this, the only sane way to do it is to
> > have unwind_frame() verify the current fp against the previous one, and
> > verify that we have some strict nesting of stacks. Generally, that means
> > we can go:
> > 
> >   overflow -> irq -> task
> > 
> > ... though I'm not sure what to do about the SDEI stack vs the overflow
> > stack.
> Actually I have had the fp check in unwind_frame(), but since I use the
> in_entry_text() to determine if stack spans, and I did not want to
> include traps.h in stacktrace.c, so I move the check out to
> dump_backtrace.
> Anyway, It seems that the key point is how should we verify that there
> are some nesting of stacks. Since in unwind_frame() we already have the
> previous fp and current fp, could we assume that if these two fps are
> NOT belong to the same stack, there should be stack spans (no matter
> task->irq, or irq->overflow, etc), and we can do the tricky to bypass
> the fp check.The sample of the prosal just like:

Unfortuantely, this still allows for loops, like task -> irq -> task, so
I think if we're going to try to fix this, we must define a nesting
order and check against that.

Thanks,
Mark.

> diff --git a/arch/arm64/include/asm/stacktrace.h
> b/arch/arm64/include/asm/stacktrace.h
> index 902f9ed..fc2bf4d 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -92,4 +92,18 @@ static inline bool on_accessible_stack(struct
> task_struct *tsk, unsigned long sp
>         return false;
>  }
> 
> +static inline bool on_same_stack(struct task_struct *tsk, unsigned long
> sp1, unsigned sp2)
> +{
> +       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
> +               return true;
> +       if (on_irq_stack(sp1) && on_irq_stack(sp2))
> +               return true;
> +       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
> +               return true;
> +       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
> +               return true;
> +
> +       return false;
> +}
> +
>  #endif /* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/kernel/stacktrace.c
> b/arch/arm64/kernel/stacktrace.c
> index d5718a0..4907a67 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -56,6 +56,13 @@ int notrace unwind_frame(struct task_struct *tsk,
> struct stackframe *frame)
>         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> 
> +       if (!on_same_stack(fp, frame->fp))
> +               fp = frame->fp + 0x8;
> +       if (fp <= frame->fp) {
> +               pr_notice("fp invalid, stop unwind\n");
> +               return -EINVAL;
> +       }
> +
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         if (tsk->ret_stack &&
>                         (frame->pc == (unsigned long)return_to_handler))
> {
> 
> Could this work?
> 
> Best Regards,
> Ji
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-04-04  9:04                   ` Mark Rutland
@ 2018-04-08  7:58                       ` Ji.Zhang
  -1 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-04-08  7:58 UTC (permalink / raw)
  To: Mark Rutland
  Cc: wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, Xie XiuQi, Ard Biesheuvel,
	Marc Zyngier, Catalin Marinas, Julien Thierry, Will Deacon,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shadanji-9Onoh4P/yGk,
	James Morse, Matthias Brugger,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Michael Weiser,
	Dave Martin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, 2018-04-04 at 10:04 +0100, Mark Rutland wrote:
> On Fri, Mar 30, 2018 at 04:08:12PM +0800, Ji.Zhang wrote:
> > On Wed, 2018-03-28 at 11:12 +0100, Mark Rutland wrote:
> > > On Wed, Mar 28, 2018 at 05:33:32PM +0800, Ji.Zhang wrote:
> > > 
> > > I'm very much not keen on this.
> > > 
> > > I think that if we're going to do this, the only sane way to do it is to
> > > have unwind_frame() verify the current fp against the previous one, and
> > > verify that we have some strict nesting of stacks. Generally, that means
> > > we can go:
> > > 
> > >   overflow -> irq -> task
> > > 
> > > ... though I'm not sure what to do about the SDEI stack vs the overflow
> > > stack.
> > Actually I have had the fp check in unwind_frame(), but since I use the
> > in_entry_text() to determine if stack spans, and I did not want to
> > include traps.h in stacktrace.c, so I move the check out to
> > dump_backtrace.
> > Anyway, It seems that the key point is how should we verify that there
> > are some nesting of stacks. Since in unwind_frame() we already have the
> > previous fp and current fp, could we assume that if these two fps are
> > NOT belong to the same stack, there should be stack spans (no matter
> > task->irq, or irq->overflow, etc), and we can do the tricky to bypass
> > the fp check.The sample of the prosal just like:
> 
> Unfortuantely, this still allows for loops, like task -> irq -> task, so
> I think if we're going to try to fix this, we must define a nesting
> order and check against that.
Yes, I see where the loop is, I have missed that the loop may cross
different stacks.
Define a nesting order and check against is a good idea, and it can
resolve the issue exactly, but as you mentioned before, we have no idea
how to handle with overflow and sdei stack, and the nesting order is
strongly related with the scenario of the stack, which means if someday
we add another stack, we should consider the relationship of the new
stack with other stacks. From the perspective of your experts, is that
suitable for doing this in unwind?

Or could we just find some way easier but not so accurate, eg.
Proposal 1: 
When we do unwind and detect that the stack spans, record the last fp of
previous stack and next time if we get into the same stack, compare it
with that last fp, the new fp should still smaller than last fp, or
there should be potential loop.
For example, when we unwind from irq to task, we record the last fp in
irq stack such as last_irq_fp, and if it unwind task stack back to irq
stack, no matter if it is the same irq stack with previous, just let it
go and compare the new irq fp with last_irq_fp, although the process may
be wrong since from task stack it could not unwind to irq stack, but the
whole process will eventually stop.

Proposal 2:
So far we have four types of stack: task, irq, overflow and sdei, could
we just assume that the MAX number of stack spanning is just 3
times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
we can just check the number of stack spanning when we detect the stack
spans.

Best Regards,
Ji
> 
> Thanks,
> Mark.
> 
> > diff --git a/arch/arm64/include/asm/stacktrace.h
> > b/arch/arm64/include/asm/stacktrace.h
> > index 902f9ed..fc2bf4d 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -92,4 +92,18 @@ static inline bool on_accessible_stack(struct
> > task_struct *tsk, unsigned long sp
> >         return false;
> >  }
> > 
> > +static inline bool on_same_stack(struct task_struct *tsk, unsigned long
> > sp1, unsigned sp2)
> > +{
> > +       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
> > +               return true;
> > +       if (on_irq_stack(sp1) && on_irq_stack(sp2))
> > +               return true;
> > +       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
> > +               return true;
> > +       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
> > +               return true;
> > +
> > +       return false;
> > +}
> > +
> >  #endif /* __ASM_STACKTRACE_H */
> > diff --git a/arch/arm64/kernel/stacktrace.c
> > b/arch/arm64/kernel/stacktrace.c
> > index d5718a0..4907a67 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -56,6 +56,13 @@ int notrace unwind_frame(struct task_struct *tsk,
> > struct stackframe *frame)
> >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > 
> > +       if (!on_same_stack(fp, frame->fp))
> > +               fp = frame->fp + 0x8;
> > +       if (fp <= frame->fp) {
> > +               pr_notice("fp invalid, stop unwind\n");
> > +               return -EINVAL;
> > +       }
> > +
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         if (tsk->ret_stack &&
> >                         (frame->pc == (unsigned long)return_to_handler))
> > {
> > 
> > Could this work?
> > 
> > Best Regards,
> > Ji
> > 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-04-08  7:58                       ` Ji.Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-04-08  7:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2018-04-04 at 10:04 +0100, Mark Rutland wrote:
> On Fri, Mar 30, 2018 at 04:08:12PM +0800, Ji.Zhang wrote:
> > On Wed, 2018-03-28 at 11:12 +0100, Mark Rutland wrote:
> > > On Wed, Mar 28, 2018 at 05:33:32PM +0800, Ji.Zhang wrote:
> > > 
> > > I'm very much not keen on this.
> > > 
> > > I think that if we're going to do this, the only sane way to do it is to
> > > have unwind_frame() verify the current fp against the previous one, and
> > > verify that we have some strict nesting of stacks. Generally, that means
> > > we can go:
> > > 
> > >   overflow -> irq -> task
> > > 
> > > ... though I'm not sure what to do about the SDEI stack vs the overflow
> > > stack.
> > Actually I have had the fp check in unwind_frame(), but since I use the
> > in_entry_text() to determine if stack spans, and I did not want to
> > include traps.h in stacktrace.c, so I move the check out to
> > dump_backtrace.
> > Anyway, It seems that the key point is how should we verify that there
> > are some nesting of stacks. Since in unwind_frame() we already have the
> > previous fp and current fp, could we assume that if these two fps are
> > NOT belong to the same stack, there should be stack spans (no matter
> > task->irq, or irq->overflow, etc), and we can do the tricky to bypass
> > the fp check.The sample of the prosal just like:
> 
> Unfortuantely, this still allows for loops, like task -> irq -> task, so
> I think if we're going to try to fix this, we must define a nesting
> order and check against that.
Yes, I see where the loop is, I have missed that the loop may cross
different stacks.
Define a nesting order and check against is a good idea, and it can
resolve the issue exactly, but as you mentioned before, we have no idea
how to handle with overflow and sdei stack, and the nesting order is
strongly related with the scenario of the stack, which means if someday
we add another stack, we should consider the relationship of the new
stack with other stacks. From the perspective of your experts, is that
suitable for doing this in unwind?

Or could we just find some way easier but not so accurate, eg.
Proposal 1: 
When we do unwind and detect that the stack spans, record the last fp of
previous stack and next time if we get into the same stack, compare it
with that last fp, the new fp should still smaller than last fp, or
there should be potential loop.
For example, when we unwind from irq to task, we record the last fp in
irq stack such as last_irq_fp, and if it unwind task stack back to irq
stack, no matter if it is the same irq stack with previous, just let it
go and compare the new irq fp with last_irq_fp, although the process may
be wrong since from task stack it could not unwind to irq stack, but the
whole process will eventually stop.

Proposal 2:
So far we have four types of stack: task, irq, overflow and sdei, could
we just assume that the MAX number of stack spanning is just 3
times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
we can just check the number of stack spanning when we detect the stack
spans.

Best Regards,
Ji
> 
> Thanks,
> Mark.
> 
> > diff --git a/arch/arm64/include/asm/stacktrace.h
> > b/arch/arm64/include/asm/stacktrace.h
> > index 902f9ed..fc2bf4d 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -92,4 +92,18 @@ static inline bool on_accessible_stack(struct
> > task_struct *tsk, unsigned long sp
> >         return false;
> >  }
> > 
> > +static inline bool on_same_stack(struct task_struct *tsk, unsigned long
> > sp1, unsigned sp2)
> > +{
> > +       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
> > +               return true;
> > +       if (on_irq_stack(sp1) && on_irq_stack(sp2))
> > +               return true;
> > +       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
> > +               return true;
> > +       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
> > +               return true;
> > +
> > +       return false;
> > +}
> > +
> >  #endif /* __ASM_STACKTRACE_H */
> > diff --git a/arch/arm64/kernel/stacktrace.c
> > b/arch/arm64/kernel/stacktrace.c
> > index d5718a0..4907a67 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -56,6 +56,13 @@ int notrace unwind_frame(struct task_struct *tsk,
> > struct stackframe *frame)
> >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > 
> > +       if (!on_same_stack(fp, frame->fp))
> > +               fp = frame->fp + 0x8;
> > +       if (fp <= frame->fp) {
> > +               pr_notice("fp invalid, stop unwind\n");
> > +               return -EINVAL;
> > +       }
> > +
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         if (tsk->ret_stack &&
> >                         (frame->pc == (unsigned long)return_to_handler))
> > {
> > 
> > Could this work?
> > 
> > Best Regards,
> > Ji
> > 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-04-08  7:58                       ` Ji.Zhang
@ 2018-04-09 11:26                         ` Mark Rutland
  -1 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-04-09 11:26 UTC (permalink / raw)
  To: Ji.Zhang
  Cc: Catalin Marinas, Will Deacon, Matthias Brugger, Ard Biesheuvel,
	James Morse, Dave Martin, Marc Zyngier, Michael Weiser,
	Julien Thierry, Xie XiuQi, linux-arm-kernel, linux-kernel,
	linux-mediatek, wsd_upstream, shadanji

On Sun, Apr 08, 2018 at 03:58:48PM +0800, Ji.Zhang wrote:
> Yes, I see where the loop is, I have missed that the loop may cross
> different stacks.
> Define a nesting order and check against is a good idea, and it can
> resolve the issue exactly, but as you mentioned before, we have no idea
> how to handle with overflow and sdei stack, and the nesting order is
> strongly related with the scenario of the stack, which means if someday
> we add another stack, we should consider the relationship of the new
> stack with other stacks. From the perspective of your experts, is that
> suitable for doing this in unwind?
> 
> Or could we just find some way easier but not so accurate, eg.
> Proposal 1: 
> When we do unwind and detect that the stack spans, record the last fp of
> previous stack and next time if we get into the same stack, compare it
> with that last fp, the new fp should still smaller than last fp, or
> there should be potential loop.
> For example, when we unwind from irq to task, we record the last fp in
> irq stack such as last_irq_fp, and if it unwind task stack back to irq
> stack, no matter if it is the same irq stack with previous, just let it
> go and compare the new irq fp with last_irq_fp, although the process may
> be wrong since from task stack it could not unwind to irq stack, but the
> whole process will eventually stop.

I agree that saving the last fp per-stack could work.

> Proposal 2:
> So far we have four types of stack: task, irq, overflow and sdei, could
> we just assume that the MAX number of stack spanning is just 3
> times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
> we can just check the number of stack spanning when we detect the stack
> spans.

I also agree that counting the number of stack transitions will prevent
an inifinite loop, even if less accurately than proposal 1.

I don't have a strong preference either way.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-04-09 11:26                         ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-04-09 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Apr 08, 2018 at 03:58:48PM +0800, Ji.Zhang wrote:
> Yes, I see where the loop is, I have missed that the loop may cross
> different stacks.
> Define a nesting order and check against is a good idea, and it can
> resolve the issue exactly, but as you mentioned before, we have no idea
> how to handle with overflow and sdei stack, and the nesting order is
> strongly related with the scenario of the stack, which means if someday
> we add another stack, we should consider the relationship of the new
> stack with other stacks. From the perspective of your experts, is that
> suitable for doing this in unwind?
> 
> Or could we just find some way easier but not so accurate, eg.
> Proposal 1: 
> When we do unwind and detect that the stack spans, record the last fp of
> previous stack and next time if we get into the same stack, compare it
> with that last fp, the new fp should still smaller than last fp, or
> there should be potential loop.
> For example, when we unwind from irq to task, we record the last fp in
> irq stack such as last_irq_fp, and if it unwind task stack back to irq
> stack, no matter if it is the same irq stack with previous, just let it
> go and compare the new irq fp with last_irq_fp, although the process may
> be wrong since from task stack it could not unwind to irq stack, but the
> whole process will eventually stop.

I agree that saving the last fp per-stack could work.

> Proposal 2:
> So far we have four types of stack: task, irq, overflow and sdei, could
> we just assume that the MAX number of stack spanning is just 3
> times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
> we can just check the number of stack spanning when we detect the stack
> spans.

I also agree that counting the number of stack transitions will prevent
an inifinite loop, even if less accurately than proposal 1.

I don't have a strong preference either way.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-04-09 11:26                         ` Mark Rutland
@ 2018-04-11  6:30                             ` Ji.Zhang
  -1 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-04-11  6:30 UTC (permalink / raw)
  To: Mark Rutland
  Cc: wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, Xie XiuQi, Ard Biesheuvel,
	Marc Zyngier, Catalin Marinas, Julien Thierry, Will Deacon,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shadanji-9Onoh4P/yGk,
	James Morse, Matthias Brugger,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Michael Weiser,
	Dave Martin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, 2018-04-09 at 12:26 +0100, Mark Rutland wrote:
> On Sun, Apr 08, 2018 at 03:58:48PM +0800, Ji.Zhang wrote:
> > Yes, I see where the loop is, I have missed that the loop may cross
> > different stacks.
> > Define a nesting order and check against is a good idea, and it can
> > resolve the issue exactly, but as you mentioned before, we have no idea
> > how to handle with overflow and sdei stack, and the nesting order is
> > strongly related with the scenario of the stack, which means if someday
> > we add another stack, we should consider the relationship of the new
> > stack with other stacks. From the perspective of your experts, is that
> > suitable for doing this in unwind?
> > 
> > Or could we just find some way easier but not so accurate, eg.
> > Proposal 1: 
> > When we do unwind and detect that the stack spans, record the last fp of
> > previous stack and next time if we get into the same stack, compare it
> > with that last fp, the new fp should still smaller than last fp, or
> > there should be potential loop.
> > For example, when we unwind from irq to task, we record the last fp in
> > irq stack such as last_irq_fp, and if it unwind task stack back to irq
> > stack, no matter if it is the same irq stack with previous, just let it
> > go and compare the new irq fp with last_irq_fp, although the process may
> > be wrong since from task stack it could not unwind to irq stack, but the
> > whole process will eventually stop.
> 
> I agree that saving the last fp per-stack could work.
> 
> > Proposal 2:
> > So far we have four types of stack: task, irq, overflow and sdei, could
> > we just assume that the MAX number of stack spanning is just 3
> > times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
> > we can just check the number of stack spanning when we detect the stack
> > spans.
> 
> I also agree that counting the number of stack transitions will prevent
> an inifinite loop, even if less accurately than proposal 1.
> 
> I don't have a strong preference either way.
Thank you for your comment.
Compared with proposal 1 and 2, I decide to use proposal2 since
proposal1 seems a little complicated and it is not as easy as proposal2
when new stack is added.
The sample is as below:
diff --git a/arch/arm64/include/asm/stacktrace.h
b/arch/arm64/include/asm/stacktrace.h
index 902f9ed..72d1f34 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -92,4 +92,22 @@ static inline bool on_accessible_stack(struct
task_struct *tsk, unsigned long sp
        return false;
 }
 
+#define MAX_STACK_SPAN 3
+DECLARE_PER_CPU(int, num_stack_span);
+
+static inline bool on_same_stack(struct task_struct *tsk,
+                               unsigned long sp1, unsigned long sp2)
+{
+       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
+               return true;
+       if (on_irq_stack(sp1) && on_irq_stack(sp2))
+               return true;
+       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
+               return true;
+       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
+               return true;
+
+       return false;
+}
+
 #endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c
b/arch/arm64/kernel/stacktrace.c
index d5718a0..db905e8 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -27,6 +27,8 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+DEFINE_PER_CPU(int, num_stack_span);
+
 /*
  * AArch64 PCS assigns the frame pointer to x29.
  *
@@ -56,6 +58,20 @@ int notrace unwind_frame(struct task_struct *tsk,
struct stackframe *frame)
        frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
        frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
 
+       if (!on_same_stack(tsk, fp, frame->fp)) {
+               int num = (int)__this_cpu_read(num_stack_span);
+
+               if (num >= MAX_STACK_SPAN)
+                       return -EINVAL;
+               num++;
+               __this_cpu_write(num_stack_span, num);
+               fp = frame->fp + 0x8;
+       }
+       if (fp <= frame->fp) {
+               pr_notice("fp invalid, stop unwind\n");
+               return -EINVAL;
+       }
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
        if (tsk->ret_stack &&
                        (frame->pc == (unsigned long)return_to_handler))
{
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index eb2d151..e6b5181 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -102,6 +102,8 @@ void dump_backtrace(struct pt_regs *regs, struct
task_struct *tsk)
        struct stackframe frame;
        int skip;
 
+       __this_cpu_write(num_stack_span, 0);
+
        pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
 
        if (!tsk)
@@ -144,6 +146,7 @@ void dump_backtrace(struct pt_regs *regs, struct
task_struct *tsk)
        } while (!unwind_frame(tsk, &frame));
 
        put_task_stack(tsk);
+       __this_cpu_write(num_stack_span, 0);
 }
 
 void show_stack(struct task_struct *tsk, unsigned long *sp)
-- 
1.9.1

If that is ok then I will submit a new patch.

Best Regards,
Ji
> 
> Thanks,
> Mark.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-04-11  6:30                             ` Ji.Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-04-11  6:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2018-04-09 at 12:26 +0100, Mark Rutland wrote:
> On Sun, Apr 08, 2018 at 03:58:48PM +0800, Ji.Zhang wrote:
> > Yes, I see where the loop is, I have missed that the loop may cross
> > different stacks.
> > Define a nesting order and check against is a good idea, and it can
> > resolve the issue exactly, but as you mentioned before, we have no idea
> > how to handle with overflow and sdei stack, and the nesting order is
> > strongly related with the scenario of the stack, which means if someday
> > we add another stack, we should consider the relationship of the new
> > stack with other stacks. From the perspective of your experts, is that
> > suitable for doing this in unwind?
> > 
> > Or could we just find some way easier but not so accurate, eg.
> > Proposal 1: 
> > When we do unwind and detect that the stack spans, record the last fp of
> > previous stack and next time if we get into the same stack, compare it
> > with that last fp, the new fp should still smaller than last fp, or
> > there should be potential loop.
> > For example, when we unwind from irq to task, we record the last fp in
> > irq stack such as last_irq_fp, and if it unwind task stack back to irq
> > stack, no matter if it is the same irq stack with previous, just let it
> > go and compare the new irq fp with last_irq_fp, although the process may
> > be wrong since from task stack it could not unwind to irq stack, but the
> > whole process will eventually stop.
> 
> I agree that saving the last fp per-stack could work.
> 
> > Proposal 2:
> > So far we have four types of stack: task, irq, overflow and sdei, could
> > we just assume that the MAX number of stack spanning is just 3
> > times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
> > we can just check the number of stack spanning when we detect the stack
> > spans.
> 
> I also agree that counting the number of stack transitions will prevent
> an inifinite loop, even if less accurately than proposal 1.
> 
> I don't have a strong preference either way.
Thank you for your comment.
Compared with proposal 1 and 2, I decide to use proposal2 since
proposal1 seems a little complicated and it is not as easy as proposal2
when new stack is added.
The sample is as below:
diff --git a/arch/arm64/include/asm/stacktrace.h
b/arch/arm64/include/asm/stacktrace.h
index 902f9ed..72d1f34 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -92,4 +92,22 @@ static inline bool on_accessible_stack(struct
task_struct *tsk, unsigned long sp
        return false;
 }
 
+#define MAX_STACK_SPAN 3
+DECLARE_PER_CPU(int, num_stack_span);
+
+static inline bool on_same_stack(struct task_struct *tsk,
+                               unsigned long sp1, unsigned long sp2)
+{
+       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
+               return true;
+       if (on_irq_stack(sp1) && on_irq_stack(sp2))
+               return true;
+       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
+               return true;
+       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
+               return true;
+
+       return false;
+}
+
 #endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c
b/arch/arm64/kernel/stacktrace.c
index d5718a0..db905e8 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -27,6 +27,8 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+DEFINE_PER_CPU(int, num_stack_span);
+
 /*
  * AArch64 PCS assigns the frame pointer to x29.
  *
@@ -56,6 +58,20 @@ int notrace unwind_frame(struct task_struct *tsk,
struct stackframe *frame)
        frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
        frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
 
+       if (!on_same_stack(tsk, fp, frame->fp)) {
+               int num = (int)__this_cpu_read(num_stack_span);
+
+               if (num >= MAX_STACK_SPAN)
+                       return -EINVAL;
+               num++;
+               __this_cpu_write(num_stack_span, num);
+               fp = frame->fp + 0x8;
+       }
+       if (fp <= frame->fp) {
+               pr_notice("fp invalid, stop unwind\n");
+               return -EINVAL;
+       }
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
        if (tsk->ret_stack &&
                        (frame->pc == (unsigned long)return_to_handler))
{
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index eb2d151..e6b5181 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -102,6 +102,8 @@ void dump_backtrace(struct pt_regs *regs, struct
task_struct *tsk)
        struct stackframe frame;
        int skip;
 
+       __this_cpu_write(num_stack_span, 0);
+
        pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
 
        if (!tsk)
@@ -144,6 +146,7 @@ void dump_backtrace(struct pt_regs *regs, struct
task_struct *tsk)
        } while (!unwind_frame(tsk, &frame));
 
        put_task_stack(tsk);
+       __this_cpu_write(num_stack_span, 0);
 }
 
 void show_stack(struct task_struct *tsk, unsigned long *sp)
-- 
1.9.1

If that is ok then I will submit a new patch.

Best Regards,
Ji
> 
> Thanks,
> Mark.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-04-11  6:30                             ` Ji.Zhang
@ 2018-04-11 10:46                               ` Mark Rutland
  -1 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-04-11 10:46 UTC (permalink / raw)
  To: Ji.Zhang
  Cc: Catalin Marinas, Will Deacon, Matthias Brugger, Ard Biesheuvel,
	James Morse, Dave Martin, Marc Zyngier, Michael Weiser,
	Julien Thierry, Xie XiuQi, linux-arm-kernel, linux-kernel,
	linux-mediatek, wsd_upstream, shadanji

On Wed, Apr 11, 2018 at 02:30:28PM +0800, Ji.Zhang wrote:
> On Mon, 2018-04-09 at 12:26 +0100, Mark Rutland wrote:
> > On Sun, Apr 08, 2018 at 03:58:48PM +0800, Ji.Zhang wrote:
> > > Yes, I see where the loop is, I have missed that the loop may cross
> > > different stacks.
> > > Define a nesting order and check against is a good idea, and it can
> > > resolve the issue exactly, but as you mentioned before, we have no idea
> > > how to handle with overflow and sdei stack, and the nesting order is
> > > strongly related with the scenario of the stack, which means if someday
> > > we add another stack, we should consider the relationship of the new
> > > stack with other stacks. From the perspective of your experts, is that
> > > suitable for doing this in unwind?
> > > 
> > > Or could we just find some way easier but not so accurate, eg.
> > > Proposal 1: 
> > > When we do unwind and detect that the stack spans, record the last fp of
> > > previous stack and next time if we get into the same stack, compare it
> > > with that last fp, the new fp should still smaller than last fp, or
> > > there should be potential loop.
> > > For example, when we unwind from irq to task, we record the last fp in
> > > irq stack such as last_irq_fp, and if it unwind task stack back to irq
> > > stack, no matter if it is the same irq stack with previous, just let it
> > > go and compare the new irq fp with last_irq_fp, although the process may
> > > be wrong since from task stack it could not unwind to irq stack, but the
> > > whole process will eventually stop.
> > 
> > I agree that saving the last fp per-stack could work.
> > 
> > > Proposal 2:
> > > So far we have four types of stack: task, irq, overflow and sdei, could
> > > we just assume that the MAX number of stack spanning is just 3
> > > times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
> > > we can just check the number of stack spanning when we detect the stack
> > > spans.
> > 
> > I also agree that counting the number of stack transitions will prevent
> > an inifinite loop, even if less accurately than proposal 1.
> > 
> > I don't have a strong preference either way.
> Thank you for your comment.
> Compared with proposal 1 and 2, I decide to use proposal2 since
> proposal1 seems a little complicated and it is not as easy as proposal2
> when new stack is added.
> The sample is as below:
> diff --git a/arch/arm64/include/asm/stacktrace.h
> b/arch/arm64/include/asm/stacktrace.h
> index 902f9ed..72d1f34 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -92,4 +92,22 @@ static inline bool on_accessible_stack(struct
> task_struct *tsk, unsigned long sp
>         return false;
>  }
>  
> +#define MAX_STACK_SPAN 3

Depending on configuration we can have:

* task
* irq
* overflow (optional with VMAP_STACK)
* sdei (optional with ARM_SDE_INTERFACE && VMAP_STACK)

So 3 isn't always correct.

Also, could we please call this something like MAX_NR_STACKS?

> +DECLARE_PER_CPU(int, num_stack_span);

I'm pretty sure we can call unwind_frame() in a preemptible context, so
this isn't safe.

Put this counter into the struct stackframe, and call it something like
nr_stacks;

[...]

> +DEFINE_PER_CPU(int, num_stack_span);

As above, this can go.

> +
>  /*
>   * AArch64 PCS assigns the frame pointer to x29.
>   *
> @@ -56,6 +58,20 @@ int notrace unwind_frame(struct task_struct *tsk,
> struct stackframe *frame)
>         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
>  
> +       if (!on_same_stack(tsk, fp, frame->fp)) {
> +               int num = (int)__this_cpu_read(num_stack_span);
> +
> +               if (num >= MAX_STACK_SPAN)
> +                       return -EINVAL;
> +               num++;
> +               __this_cpu_write(num_stack_span, num);
> +               fp = frame->fp + 0x8;
> +       }
> +       if (fp <= frame->fp) {
> +               pr_notice("fp invalid, stop unwind\n");
> +               return -EINVAL;
> +       }

I think this can be simplified to something like:

	bool same_stack;

	same_stack = on_same_stack(tsk, fp, frame->fp);

	if (fp <= frame->fp && same_stack)
		return -EINVAL;
	if (!same_stack && ++frame->nr_stacks > MAX_NR_STACKS)
		return -EINVAL;

... assuming we add nr_stacks to struct stackframe.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-04-11 10:46                               ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2018-04-11 10:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 11, 2018 at 02:30:28PM +0800, Ji.Zhang wrote:
> On Mon, 2018-04-09 at 12:26 +0100, Mark Rutland wrote:
> > On Sun, Apr 08, 2018 at 03:58:48PM +0800, Ji.Zhang wrote:
> > > Yes, I see where the loop is, I have missed that the loop may cross
> > > different stacks.
> > > Define a nesting order and check against is a good idea, and it can
> > > resolve the issue exactly, but as you mentioned before, we have no idea
> > > how to handle with overflow and sdei stack, and the nesting order is
> > > strongly related with the scenario of the stack, which means if someday
> > > we add another stack, we should consider the relationship of the new
> > > stack with other stacks. From the perspective of your experts, is that
> > > suitable for doing this in unwind?
> > > 
> > > Or could we just find some way easier but not so accurate, eg.
> > > Proposal 1: 
> > > When we do unwind and detect that the stack spans, record the last fp of
> > > previous stack and next time if we get into the same stack, compare it
> > > with that last fp, the new fp should still smaller than last fp, or
> > > there should be potential loop.
> > > For example, when we unwind from irq to task, we record the last fp in
> > > irq stack such as last_irq_fp, and if it unwind task stack back to irq
> > > stack, no matter if it is the same irq stack with previous, just let it
> > > go and compare the new irq fp with last_irq_fp, although the process may
> > > be wrong since from task stack it could not unwind to irq stack, but the
> > > whole process will eventually stop.
> > 
> > I agree that saving the last fp per-stack could work.
> > 
> > > Proposal 2:
> > > So far we have four types of stack: task, irq, overflow and sdei, could
> > > we just assume that the MAX number of stack spanning is just 3
> > > times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
> > > we can just check the number of stack spanning when we detect the stack
> > > spans.
> > 
> > I also agree that counting the number of stack transitions will prevent
> > an inifinite loop, even if less accurately than proposal 1.
> > 
> > I don't have a strong preference either way.
> Thank you for your comment.
> Compared with proposal 1 and 2, I decide to use proposal2 since
> proposal1 seems a little complicated and it is not as easy as proposal2
> when new stack is added.
> The sample is as below:
> diff --git a/arch/arm64/include/asm/stacktrace.h
> b/arch/arm64/include/asm/stacktrace.h
> index 902f9ed..72d1f34 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -92,4 +92,22 @@ static inline bool on_accessible_stack(struct
> task_struct *tsk, unsigned long sp
>         return false;
>  }
>  
> +#define MAX_STACK_SPAN 3

Depending on configuration we can have:

* task
* irq
* overflow (optional with VMAP_STACK)
* sdei (optional with ARM_SDE_INTERFACE && VMAP_STACK)

So 3 isn't always correct.

Also, could we please call this something like MAX_NR_STACKS?

> +DECLARE_PER_CPU(int, num_stack_span);

I'm pretty sure we can call unwind_frame() in a preemptible context, so
this isn't safe.

Put this counter into the struct stackframe, and call it something like
nr_stacks;

[...]

> +DEFINE_PER_CPU(int, num_stack_span);

As above, this can go.

> +
>  /*
>   * AArch64 PCS assigns the frame pointer to x29.
>   *
> @@ -56,6 +58,20 @@ int notrace unwind_frame(struct task_struct *tsk,
> struct stackframe *frame)
>         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
>  
> +       if (!on_same_stack(tsk, fp, frame->fp)) {
> +               int num = (int)__this_cpu_read(num_stack_span);
> +
> +               if (num >= MAX_STACK_SPAN)
> +                       return -EINVAL;
> +               num++;
> +               __this_cpu_write(num_stack_span, num);
> +               fp = frame->fp + 0x8;
> +       }
> +       if (fp <= frame->fp) {
> +               pr_notice("fp invalid, stop unwind\n");
> +               return -EINVAL;
> +       }

I think this can be simplified to something like:

	bool same_stack;

	same_stack = on_same_stack(tsk, fp, frame->fp);

	if (fp <= frame->fp && same_stack)
		return -EINVAL;
	if (!same_stack && ++frame->nr_stacks > MAX_NR_STACKS)
		return -EINVAL;

... assuming we add nr_stacks to struct stackframe.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-04-11 10:46                               ` Mark Rutland
@ 2018-04-12  6:13                                   ` Ji.Zhang
  -1 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-04-12  6:13 UTC (permalink / raw)
  To: Mark Rutland
  Cc: wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, Xie XiuQi, Ard Biesheuvel,
	Marc Zyngier, Catalin Marinas, Julien Thierry, Will Deacon,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shadanji-9Onoh4P/yGk,
	James Morse, Matthias Brugger,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Michael Weiser,
	Dave Martin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, 2018-04-11 at 11:46 +0100, Mark Rutland wrote:
> On Wed, Apr 11, 2018 at 02:30:28PM +0800, Ji.Zhang wrote:
> > On Mon, 2018-04-09 at 12:26 +0100, Mark Rutland wrote:
> > > On Sun, Apr 08, 2018 at 03:58:48PM +0800, Ji.Zhang wrote:
> > > > Yes, I see where the loop is, I have missed that the loop may cross
> > > > different stacks.
> > > > Define a nesting order and check against is a good idea, and it can
> > > > resolve the issue exactly, but as you mentioned before, we have no idea
> > > > how to handle with overflow and sdei stack, and the nesting order is
> > > > strongly related with the scenario of the stack, which means if someday
> > > > we add another stack, we should consider the relationship of the new
> > > > stack with other stacks. From the perspective of your experts, is that
> > > > suitable for doing this in unwind?
> > > > 
> > > > Or could we just find some way easier but not so accurate, eg.
> > > > Proposal 1: 
> > > > When we do unwind and detect that the stack spans, record the last fp of
> > > > previous stack and next time if we get into the same stack, compare it
> > > > with that last fp, the new fp should still smaller than last fp, or
> > > > there should be potential loop.
> > > > For example, when we unwind from irq to task, we record the last fp in
> > > > irq stack such as last_irq_fp, and if it unwind task stack back to irq
> > > > stack, no matter if it is the same irq stack with previous, just let it
> > > > go and compare the new irq fp with last_irq_fp, although the process may
> > > > be wrong since from task stack it could not unwind to irq stack, but the
> > > > whole process will eventually stop.
> > > 
> > > I agree that saving the last fp per-stack could work.
> > > 
> > > > Proposal 2:
> > > > So far we have four types of stack: task, irq, overflow and sdei, could
> > > > we just assume that the MAX number of stack spanning is just 3
> > > > times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
> > > > we can just check the number of stack spanning when we detect the stack
> > > > spans.
> > > 
> > > I also agree that counting the number of stack transitions will prevent
> > > an inifinite loop, even if less accurately than proposal 1.
> > > 
> > > I don't have a strong preference either way.
> > Thank you for your comment.
> > Compared with proposal 1 and 2, I decide to use proposal2 since
> > proposal1 seems a little complicated and it is not as easy as proposal2
> > when new stack is added.
> > The sample is as below:
> > diff --git a/arch/arm64/include/asm/stacktrace.h
> > b/arch/arm64/include/asm/stacktrace.h
> > index 902f9ed..72d1f34 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -92,4 +92,22 @@ static inline bool on_accessible_stack(struct
> > task_struct *tsk, unsigned long sp
> >         return false;
> >  }
> >  
> > +#define MAX_STACK_SPAN 3
> 
> Depending on configuration we can have:
> 
> * task
> * irq
> * overflow (optional with VMAP_STACK)
> * sdei (optional with ARM_SDE_INTERFACE && VMAP_STACK)
> 
> So 3 isn't always correct.
> 
> Also, could we please call this something like MAX_NR_STACKS?
> 
> > +DECLARE_PER_CPU(int, num_stack_span);
> 
> I'm pretty sure we can call unwind_frame() in a preemptible context, so
> this isn't safe.
> 
> Put this counter into the struct stackframe, and call it something like
> nr_stacks;
> 
> [...]
> 
> > +DEFINE_PER_CPU(int, num_stack_span);
> 
> As above, this can go.
> 
> > +
> >  /*
> >   * AArch64 PCS assigns the frame pointer to x29.
> >   *
> > @@ -56,6 +58,20 @@ int notrace unwind_frame(struct task_struct *tsk,
> > struct stackframe *frame)
> >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> >  
> > +       if (!on_same_stack(tsk, fp, frame->fp)) {
> > +               int num = (int)__this_cpu_read(num_stack_span);
> > +
> > +               if (num >= MAX_STACK_SPAN)
> > +                       return -EINVAL;
> > +               num++;
> > +               __this_cpu_write(num_stack_span, num);
> > +               fp = frame->fp + 0x8;
> > +       }
> > +       if (fp <= frame->fp) {
> > +               pr_notice("fp invalid, stop unwind\n");
> > +               return -EINVAL;
> > +       }
> 
> I think this can be simplified to something like:
> 
> 	bool same_stack;
> 
> 	same_stack = on_same_stack(tsk, fp, frame->fp);
> 
> 	if (fp <= frame->fp && same_stack)
> 		return -EINVAL;
> 	if (!same_stack && ++frame->nr_stacks > MAX_NR_STACKS)
> 		return -EINVAL;
> 
> ... assuming we add nr_stacks to struct stackframe.
Thank you very much for your advice, they are very valuable.
According to your suggestion, the modified code as follows.
I did a little change that define MAX_NR_STACKS as the number of stacks,
instead of the number of stack spans.

diff --git a/arch/arm64/include/asm/stacktrace.h
b/arch/arm64/include/asm/stacktrace.h
index 902f9ed..f235b86 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -24,9 +24,18 @@
 #include <asm/ptrace.h>
 #include <asm/sdei.h>
 
+#ifndef CONFIG_VMAP_STACK
+#define MAX_NR_STACKS  2
+#elif !defined(CONFIG_ARM_SDE_INTERFACE)
+#define MAX_NR_STACKS  3
+#else
+#define MAX_NR_STACKS  4
+#endif
+
 struct stackframe {
        unsigned long fp;
        unsigned long pc;
+       int nr_stacks;
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
        int graph;
 #endif
@@ -92,4 +101,20 @@ static inline bool on_accessible_stack(struct
task_struct *tsk, unsigned long sp
        return false;
 }
 
+
+static inline bool on_same_stack(struct task_struct *tsk,
+                               unsigned long sp1, unsigned long sp2)
+{
+       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
+               return true;
+       if (on_irq_stack(sp1) && on_irq_stack(sp2))
+               return true;
+       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
+               return true;
+       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
+               return true;
+
+       return false;
+}
+
 #endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c
b/arch/arm64/kernel/stacktrace.c
index d5718a0..a09e247 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -27,6 +27,7 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+
 /*
  * AArch64 PCS assigns the frame pointer to x29.
  *
@@ -43,6 +44,7 @@
 int notrace unwind_frame(struct task_struct *tsk, struct stackframe
*frame)
 {
        unsigned long fp = frame->fp;
+       bool same_stack;
 
        if (fp & 0xf)
                return -EINVAL;
@@ -56,6 +58,13 @@ int notrace unwind_frame(struct task_struct *tsk,
struct stackframe *frame)
        frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
        frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
 
+       same_stack = on_same_stack(tsk, fp, frame->fp);
+
+       if (fp <= frame->fp && same_stack)
+               return -EINVAL;
+       if (!same_stack && ++frame->nr_stacks > MAX_NR_STACKS)
+               return -EINVAL;
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
        if (tsk->ret_stack &&
                        (frame->pc == (unsigned long)return_to_handler))
{
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index eb2d151..3b1c472 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -120,6 +120,7 @@ void dump_backtrace(struct pt_regs *regs, struct
task_struct *tsk)
                frame.fp = thread_saved_fp(tsk);
                frame.pc = thread_saved_pc(tsk);
        }
+       frame.nr_stacks = 1;
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
        frame.graph = tsk->curr_ret_stack;
 #endif
-- 
1.9.1

Best Regards,
Ji

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-04-12  6:13                                   ` Ji.Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-04-12  6:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2018-04-11 at 11:46 +0100, Mark Rutland wrote:
> On Wed, Apr 11, 2018 at 02:30:28PM +0800, Ji.Zhang wrote:
> > On Mon, 2018-04-09 at 12:26 +0100, Mark Rutland wrote:
> > > On Sun, Apr 08, 2018 at 03:58:48PM +0800, Ji.Zhang wrote:
> > > > Yes, I see where the loop is, I have missed that the loop may cross
> > > > different stacks.
> > > > Define a nesting order and check against is a good idea, and it can
> > > > resolve the issue exactly, but as you mentioned before, we have no idea
> > > > how to handle with overflow and sdei stack, and the nesting order is
> > > > strongly related with the scenario of the stack, which means if someday
> > > > we add another stack, we should consider the relationship of the new
> > > > stack with other stacks. From the perspective of your experts, is that
> > > > suitable for doing this in unwind?
> > > > 
> > > > Or could we just find some way easier but not so accurate, eg.
> > > > Proposal 1: 
> > > > When we do unwind and detect that the stack spans, record the last fp of
> > > > previous stack and next time if we get into the same stack, compare it
> > > > with that last fp, the new fp should still smaller than last fp, or
> > > > there should be potential loop.
> > > > For example, when we unwind from irq to task, we record the last fp in
> > > > irq stack such as last_irq_fp, and if it unwind task stack back to irq
> > > > stack, no matter if it is the same irq stack with previous, just let it
> > > > go and compare the new irq fp with last_irq_fp, although the process may
> > > > be wrong since from task stack it could not unwind to irq stack, but the
> > > > whole process will eventually stop.
> > > 
> > > I agree that saving the last fp per-stack could work.
> > > 
> > > > Proposal 2:
> > > > So far we have four types of stack: task, irq, overflow and sdei, could
> > > > we just assume that the MAX number of stack spanning is just 3
> > > > times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
> > > > we can just check the number of stack spanning when we detect the stack
> > > > spans.
> > > 
> > > I also agree that counting the number of stack transitions will prevent
> > > an inifinite loop, even if less accurately than proposal 1.
> > > 
> > > I don't have a strong preference either way.
> > Thank you for your comment.
> > Compared with proposal 1 and 2, I decide to use proposal2 since
> > proposal1 seems a little complicated and it is not as easy as proposal2
> > when new stack is added.
> > The sample is as below:
> > diff --git a/arch/arm64/include/asm/stacktrace.h
> > b/arch/arm64/include/asm/stacktrace.h
> > index 902f9ed..72d1f34 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -92,4 +92,22 @@ static inline bool on_accessible_stack(struct
> > task_struct *tsk, unsigned long sp
> >         return false;
> >  }
> >  
> > +#define MAX_STACK_SPAN 3
> 
> Depending on configuration we can have:
> 
> * task
> * irq
> * overflow (optional with VMAP_STACK)
> * sdei (optional with ARM_SDE_INTERFACE && VMAP_STACK)
> 
> So 3 isn't always correct.
> 
> Also, could we please call this something like MAX_NR_STACKS?
> 
> > +DECLARE_PER_CPU(int, num_stack_span);
> 
> I'm pretty sure we can call unwind_frame() in a preemptible context, so
> this isn't safe.
> 
> Put this counter into the struct stackframe, and call it something like
> nr_stacks;
> 
> [...]
> 
> > +DEFINE_PER_CPU(int, num_stack_span);
> 
> As above, this can go.
> 
> > +
> >  /*
> >   * AArch64 PCS assigns the frame pointer to x29.
> >   *
> > @@ -56,6 +58,20 @@ int notrace unwind_frame(struct task_struct *tsk,
> > struct stackframe *frame)
> >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> >  
> > +       if (!on_same_stack(tsk, fp, frame->fp)) {
> > +               int num = (int)__this_cpu_read(num_stack_span);
> > +
> > +               if (num >= MAX_STACK_SPAN)
> > +                       return -EINVAL;
> > +               num++;
> > +               __this_cpu_write(num_stack_span, num);
> > +               fp = frame->fp + 0x8;
> > +       }
> > +       if (fp <= frame->fp) {
> > +               pr_notice("fp invalid, stop unwind\n");
> > +               return -EINVAL;
> > +       }
> 
> I think this can be simplified to something like:
> 
> 	bool same_stack;
> 
> 	same_stack = on_same_stack(tsk, fp, frame->fp);
> 
> 	if (fp <= frame->fp && same_stack)
> 		return -EINVAL;
> 	if (!same_stack && ++frame->nr_stacks > MAX_NR_STACKS)
> 		return -EINVAL;
> 
> ... assuming we add nr_stacks to struct stackframe.
Thank you very much for your advice, they are very valuable.
According to your suggestion, the modified code as follows.
I did a little change that define MAX_NR_STACKS as the number of stacks,
instead of the number of stack spans.

diff --git a/arch/arm64/include/asm/stacktrace.h
b/arch/arm64/include/asm/stacktrace.h
index 902f9ed..f235b86 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -24,9 +24,18 @@
 #include <asm/ptrace.h>
 #include <asm/sdei.h>
 
+#ifndef CONFIG_VMAP_STACK
+#define MAX_NR_STACKS  2
+#elif !defined(CONFIG_ARM_SDE_INTERFACE)
+#define MAX_NR_STACKS  3
+#else
+#define MAX_NR_STACKS  4
+#endif
+
 struct stackframe {
        unsigned long fp;
        unsigned long pc;
+       int nr_stacks;
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
        int graph;
 #endif
@@ -92,4 +101,20 @@ static inline bool on_accessible_stack(struct
task_struct *tsk, unsigned long sp
        return false;
 }
 
+
+static inline bool on_same_stack(struct task_struct *tsk,
+                               unsigned long sp1, unsigned long sp2)
+{
+       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
+               return true;
+       if (on_irq_stack(sp1) && on_irq_stack(sp2))
+               return true;
+       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
+               return true;
+       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
+               return true;
+
+       return false;
+}
+
 #endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c
b/arch/arm64/kernel/stacktrace.c
index d5718a0..a09e247 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -27,6 +27,7 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+
 /*
  * AArch64 PCS assigns the frame pointer to x29.
  *
@@ -43,6 +44,7 @@
 int notrace unwind_frame(struct task_struct *tsk, struct stackframe
*frame)
 {
        unsigned long fp = frame->fp;
+       bool same_stack;
 
        if (fp & 0xf)
                return -EINVAL;
@@ -56,6 +58,13 @@ int notrace unwind_frame(struct task_struct *tsk,
struct stackframe *frame)
        frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
        frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
 
+       same_stack = on_same_stack(tsk, fp, frame->fp);
+
+       if (fp <= frame->fp && same_stack)
+               return -EINVAL;
+       if (!same_stack && ++frame->nr_stacks > MAX_NR_STACKS)
+               return -EINVAL;
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
        if (tsk->ret_stack &&
                        (frame->pc == (unsigned long)return_to_handler))
{
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index eb2d151..3b1c472 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -120,6 +120,7 @@ void dump_backtrace(struct pt_regs *regs, struct
task_struct *tsk)
                frame.fp = thread_saved_fp(tsk);
                frame.pc = thread_saved_pc(tsk);
        }
+       frame.nr_stacks = 1;
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
        frame.graph = tsk->curr_ret_stack;
 #endif
-- 
1.9.1

Best Regards,
Ji

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] arm64: avoid race condition issue in dump_backtrace
  2018-04-12  6:13                                   ` Ji.Zhang
@ 2018-04-20  5:43                                     ` Ji.Zhang
  -1 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-04-20  5:43 UTC (permalink / raw)
  To: Mark Rutland
  Cc: wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, Xie XiuQi, Ard Biesheuvel,
	Marc Zyngier, Catalin Marinas, Julien Thierry, Will Deacon,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shadanji-9Onoh4P/yGk,
	James Morse, Matthias Brugger,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Michael Weiser,
	Dave Martin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, 2018-04-12 at 14:13 +0800, Ji.Zhang wrote:
> On Wed, 2018-04-11 at 11:46 +0100, Mark Rutland wrote:
> > On Wed, Apr 11, 2018 at 02:30:28PM +0800, Ji.Zhang wrote:
> > > On Mon, 2018-04-09 at 12:26 +0100, Mark Rutland wrote:
> > > > On Sun, Apr 08, 2018 at 03:58:48PM +0800, Ji.Zhang wrote:
> > > > > Yes, I see where the loop is, I have missed that the loop may cross
> > > > > different stacks.
> > > > > Define a nesting order and check against is a good idea, and it can
> > > > > resolve the issue exactly, but as you mentioned before, we have no idea
> > > > > how to handle with overflow and sdei stack, and the nesting order is
> > > > > strongly related with the scenario of the stack, which means if someday
> > > > > we add another stack, we should consider the relationship of the new
> > > > > stack with other stacks. From the perspective of your experts, is that
> > > > > suitable for doing this in unwind?
> > > > > 
> > > > > Or could we just find some way easier but not so accurate, eg.
> > > > > Proposal 1: 
> > > > > When we do unwind and detect that the stack spans, record the last fp of
> > > > > previous stack and next time if we get into the same stack, compare it
> > > > > with that last fp, the new fp should still smaller than last fp, or
> > > > > there should be potential loop.
> > > > > For example, when we unwind from irq to task, we record the last fp in
> > > > > irq stack such as last_irq_fp, and if it unwind task stack back to irq
> > > > > stack, no matter if it is the same irq stack with previous, just let it
> > > > > go and compare the new irq fp with last_irq_fp, although the process may
> > > > > be wrong since from task stack it could not unwind to irq stack, but the
> > > > > whole process will eventually stop.
> > > > 
> > > > I agree that saving the last fp per-stack could work.
> > > > 
> > > > > Proposal 2:
> > > > > So far we have four types of stack: task, irq, overflow and sdei, could
> > > > > we just assume that the MAX number of stack spanning is just 3
> > > > > times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
> > > > > we can just check the number of stack spanning when we detect the stack
> > > > > spans.
> > > > 
> > > > I also agree that counting the number of stack transitions will prevent
> > > > an inifinite loop, even if less accurately than proposal 1.
> > > > 
> > > > I don't have a strong preference either way.
> > > Thank you for your comment.
> > > Compared with proposal 1 and 2, I decide to use proposal2 since
> > > proposal1 seems a little complicated and it is not as easy as proposal2
> > > when new stack is added.
> > > The sample is as below:
> > > diff --git a/arch/arm64/include/asm/stacktrace.h
> > > b/arch/arm64/include/asm/stacktrace.h
> > > index 902f9ed..72d1f34 100644
> > > --- a/arch/arm64/include/asm/stacktrace.h
> > > +++ b/arch/arm64/include/asm/stacktrace.h
> > > @@ -92,4 +92,22 @@ static inline bool on_accessible_stack(struct
> > > task_struct *tsk, unsigned long sp
> > >         return false;
> > >  }
> > >  
> > > +#define MAX_STACK_SPAN 3
> > 
> > Depending on configuration we can have:
> > 
> > * task
> > * irq
> > * overflow (optional with VMAP_STACK)
> > * sdei (optional with ARM_SDE_INTERFACE && VMAP_STACK)
> > 
> > So 3 isn't always correct.
> > 
> > Also, could we please call this something like MAX_NR_STACKS?
> > 
> > > +DECLARE_PER_CPU(int, num_stack_span);
> > 
> > I'm pretty sure we can call unwind_frame() in a preemptible context, so
> > this isn't safe.
> > 
> > Put this counter into the struct stackframe, and call it something like
> > nr_stacks;
> > 
> > [...]
> > 
> > > +DEFINE_PER_CPU(int, num_stack_span);
> > 
> > As above, this can go.
> > 
> > > +
> > >  /*
> > >   * AArch64 PCS assigns the frame pointer to x29.
> > >   *
> > > @@ -56,6 +58,20 @@ int notrace unwind_frame(struct task_struct *tsk,
> > > struct stackframe *frame)
> > >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> > >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > >  
> > > +       if (!on_same_stack(tsk, fp, frame->fp)) {
> > > +               int num = (int)__this_cpu_read(num_stack_span);
> > > +
> > > +               if (num >= MAX_STACK_SPAN)
> > > +                       return -EINVAL;
> > > +               num++;
> > > +               __this_cpu_write(num_stack_span, num);
> > > +               fp = frame->fp + 0x8;
> > > +       }
> > > +       if (fp <= frame->fp) {
> > > +               pr_notice("fp invalid, stop unwind\n");
> > > +               return -EINVAL;
> > > +       }
> > 
> > I think this can be simplified to something like:
> > 
> > 	bool same_stack;
> > 
> > 	same_stack = on_same_stack(tsk, fp, frame->fp);
> > 
> > 	if (fp <= frame->fp && same_stack)
> > 		return -EINVAL;
> > 	if (!same_stack && ++frame->nr_stacks > MAX_NR_STACKS)
> > 		return -EINVAL;
> > 
> > ... assuming we add nr_stacks to struct stackframe.
> Thank you very much for your advice, they are very valuable.
> According to your suggestion, the modified code as follows.
> I did a little change that define MAX_NR_STACKS as the number of stacks,
> instead of the number of stack spans.
> 
> diff --git a/arch/arm64/include/asm/stacktrace.h
> b/arch/arm64/include/asm/stacktrace.h
> index 902f9ed..f235b86 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -24,9 +24,18 @@
>  #include <asm/ptrace.h>
>  #include <asm/sdei.h>
>  
> +#ifndef CONFIG_VMAP_STACK
> +#define MAX_NR_STACKS  2
> +#elif !defined(CONFIG_ARM_SDE_INTERFACE)
> +#define MAX_NR_STACKS  3
> +#else
> +#define MAX_NR_STACKS  4
> +#endif
> +
>  struct stackframe {
>         unsigned long fp;
>         unsigned long pc;
> +       int nr_stacks;
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         int graph;
>  #endif
> @@ -92,4 +101,20 @@ static inline bool on_accessible_stack(struct
> task_struct *tsk, unsigned long sp
>         return false;
>  }
>  
> +
> +static inline bool on_same_stack(struct task_struct *tsk,
> +                               unsigned long sp1, unsigned long sp2)
> +{
> +       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
> +               return true;
> +       if (on_irq_stack(sp1) && on_irq_stack(sp2))
> +               return true;
> +       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
> +               return true;
> +       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
> +               return true;
> +
> +       return false;
> +}
> +
>  #endif /* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/kernel/stacktrace.c
> b/arch/arm64/kernel/stacktrace.c
> index d5718a0..a09e247 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -27,6 +27,7 @@
>  #include <asm/stack_pointer.h>
>  #include <asm/stacktrace.h>
>  
> +
>  /*
>   * AArch64 PCS assigns the frame pointer to x29.
>   *
> @@ -43,6 +44,7 @@
>  int notrace unwind_frame(struct task_struct *tsk, struct stackframe
> *frame)
>  {
>         unsigned long fp = frame->fp;
> +       bool same_stack;
>  
>         if (fp & 0xf)
>                 return -EINVAL;
> @@ -56,6 +58,13 @@ int notrace unwind_frame(struct task_struct *tsk,
> struct stackframe *frame)
>         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
>  
> +       same_stack = on_same_stack(tsk, fp, frame->fp);
> +
> +       if (fp <= frame->fp && same_stack)
> +               return -EINVAL;
> +       if (!same_stack && ++frame->nr_stacks > MAX_NR_STACKS)
> +               return -EINVAL;
> +
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         if (tsk->ret_stack &&
>                         (frame->pc == (unsigned long)return_to_handler))
> {
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index eb2d151..3b1c472 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -120,6 +120,7 @@ void dump_backtrace(struct pt_regs *regs, struct
> task_struct *tsk)
>                 frame.fp = thread_saved_fp(tsk);
>                 frame.pc = thread_saved_pc(tsk);
>         }
> +       frame.nr_stacks = 1;
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         frame.graph = tsk->curr_ret_stack;
>  #endif
Hi all sirs,

Since the discussion is far away with the original topic of this patch,
I have submitted a new patch: "arm64: avoid potential infinity loop in
dump_backtrace" based on the latest sample code.
We can switch to the new mail for more discussions.

Thanks,
Ji

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] arm64: avoid race condition issue in dump_backtrace
@ 2018-04-20  5:43                                     ` Ji.Zhang
  0 siblings, 0 replies; 36+ messages in thread
From: Ji.Zhang @ 2018-04-20  5:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2018-04-12 at 14:13 +0800, Ji.Zhang wrote:
> On Wed, 2018-04-11 at 11:46 +0100, Mark Rutland wrote:
> > On Wed, Apr 11, 2018 at 02:30:28PM +0800, Ji.Zhang wrote:
> > > On Mon, 2018-04-09 at 12:26 +0100, Mark Rutland wrote:
> > > > On Sun, Apr 08, 2018 at 03:58:48PM +0800, Ji.Zhang wrote:
> > > > > Yes, I see where the loop is, I have missed that the loop may cross
> > > > > different stacks.
> > > > > Define a nesting order and check against is a good idea, and it can
> > > > > resolve the issue exactly, but as you mentioned before, we have no idea
> > > > > how to handle with overflow and sdei stack, and the nesting order is
> > > > > strongly related with the scenario of the stack, which means if someday
> > > > > we add another stack, we should consider the relationship of the new
> > > > > stack with other stacks. From the perspective of your experts, is that
> > > > > suitable for doing this in unwind?
> > > > > 
> > > > > Or could we just find some way easier but not so accurate, eg.
> > > > > Proposal 1: 
> > > > > When we do unwind and detect that the stack spans, record the last fp of
> > > > > previous stack and next time if we get into the same stack, compare it
> > > > > with that last fp, the new fp should still smaller than last fp, or
> > > > > there should be potential loop.
> > > > > For example, when we unwind from irq to task, we record the last fp in
> > > > > irq stack such as last_irq_fp, and if it unwind task stack back to irq
> > > > > stack, no matter if it is the same irq stack with previous, just let it
> > > > > go and compare the new irq fp with last_irq_fp, although the process may
> > > > > be wrong since from task stack it could not unwind to irq stack, but the
> > > > > whole process will eventually stop.
> > > > 
> > > > I agree that saving the last fp per-stack could work.
> > > > 
> > > > > Proposal 2:
> > > > > So far we have four types of stack: task, irq, overflow and sdei, could
> > > > > we just assume that the MAX number of stack spanning is just 3
> > > > > times?(task->irq->overflow->sdei or task->irq->sdei->overflow), if yes,
> > > > > we can just check the number of stack spanning when we detect the stack
> > > > > spans.
> > > > 
> > > > I also agree that counting the number of stack transitions will prevent
> > > > an inifinite loop, even if less accurately than proposal 1.
> > > > 
> > > > I don't have a strong preference either way.
> > > Thank you for your comment.
> > > Compared with proposal 1 and 2, I decide to use proposal2 since
> > > proposal1 seems a little complicated and it is not as easy as proposal2
> > > when new stack is added.
> > > The sample is as below:
> > > diff --git a/arch/arm64/include/asm/stacktrace.h
> > > b/arch/arm64/include/asm/stacktrace.h
> > > index 902f9ed..72d1f34 100644
> > > --- a/arch/arm64/include/asm/stacktrace.h
> > > +++ b/arch/arm64/include/asm/stacktrace.h
> > > @@ -92,4 +92,22 @@ static inline bool on_accessible_stack(struct
> > > task_struct *tsk, unsigned long sp
> > >         return false;
> > >  }
> > >  
> > > +#define MAX_STACK_SPAN 3
> > 
> > Depending on configuration we can have:
> > 
> > * task
> > * irq
> > * overflow (optional with VMAP_STACK)
> > * sdei (optional with ARM_SDE_INTERFACE && VMAP_STACK)
> > 
> > So 3 isn't always correct.
> > 
> > Also, could we please call this something like MAX_NR_STACKS?
> > 
> > > +DECLARE_PER_CPU(int, num_stack_span);
> > 
> > I'm pretty sure we can call unwind_frame() in a preemptible context, so
> > this isn't safe.
> > 
> > Put this counter into the struct stackframe, and call it something like
> > nr_stacks;
> > 
> > [...]
> > 
> > > +DEFINE_PER_CPU(int, num_stack_span);
> > 
> > As above, this can go.
> > 
> > > +
> > >  /*
> > >   * AArch64 PCS assigns the frame pointer to x29.
> > >   *
> > > @@ -56,6 +58,20 @@ int notrace unwind_frame(struct task_struct *tsk,
> > > struct stackframe *frame)
> > >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> > >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > >  
> > > +       if (!on_same_stack(tsk, fp, frame->fp)) {
> > > +               int num = (int)__this_cpu_read(num_stack_span);
> > > +
> > > +               if (num >= MAX_STACK_SPAN)
> > > +                       return -EINVAL;
> > > +               num++;
> > > +               __this_cpu_write(num_stack_span, num);
> > > +               fp = frame->fp + 0x8;
> > > +       }
> > > +       if (fp <= frame->fp) {
> > > +               pr_notice("fp invalid, stop unwind\n");
> > > +               return -EINVAL;
> > > +       }
> > 
> > I think this can be simplified to something like:
> > 
> > 	bool same_stack;
> > 
> > 	same_stack = on_same_stack(tsk, fp, frame->fp);
> > 
> > 	if (fp <= frame->fp && same_stack)
> > 		return -EINVAL;
> > 	if (!same_stack && ++frame->nr_stacks > MAX_NR_STACKS)
> > 		return -EINVAL;
> > 
> > ... assuming we add nr_stacks to struct stackframe.
> Thank you very much for your advice, they are very valuable.
> According to your suggestion, the modified code as follows.
> I did a little change that define MAX_NR_STACKS as the number of stacks,
> instead of the number of stack spans.
> 
> diff --git a/arch/arm64/include/asm/stacktrace.h
> b/arch/arm64/include/asm/stacktrace.h
> index 902f9ed..f235b86 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -24,9 +24,18 @@
>  #include <asm/ptrace.h>
>  #include <asm/sdei.h>
>  
> +#ifndef CONFIG_VMAP_STACK
> +#define MAX_NR_STACKS  2
> +#elif !defined(CONFIG_ARM_SDE_INTERFACE)
> +#define MAX_NR_STACKS  3
> +#else
> +#define MAX_NR_STACKS  4
> +#endif
> +
>  struct stackframe {
>         unsigned long fp;
>         unsigned long pc;
> +       int nr_stacks;
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         int graph;
>  #endif
> @@ -92,4 +101,20 @@ static inline bool on_accessible_stack(struct
> task_struct *tsk, unsigned long sp
>         return false;
>  }
>  
> +
> +static inline bool on_same_stack(struct task_struct *tsk,
> +                               unsigned long sp1, unsigned long sp2)
> +{
> +       if (on_task_stack(tsk, sp1) && on_task_stack(tsk, sp2))
> +               return true;
> +       if (on_irq_stack(sp1) && on_irq_stack(sp2))
> +               return true;
> +       if (on_overflow_stack(sp1) && on_overflow_stack(sp2))
> +               return true;
> +       if (on_sdei_stack(sp1) && on_sdei_stack(sp2))
> +               return true;
> +
> +       return false;
> +}
> +
>  #endif /* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/kernel/stacktrace.c
> b/arch/arm64/kernel/stacktrace.c
> index d5718a0..a09e247 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -27,6 +27,7 @@
>  #include <asm/stack_pointer.h>
>  #include <asm/stacktrace.h>
>  
> +
>  /*
>   * AArch64 PCS assigns the frame pointer to x29.
>   *
> @@ -43,6 +44,7 @@
>  int notrace unwind_frame(struct task_struct *tsk, struct stackframe
> *frame)
>  {
>         unsigned long fp = frame->fp;
> +       bool same_stack;
>  
>         if (fp & 0xf)
>                 return -EINVAL;
> @@ -56,6 +58,13 @@ int notrace unwind_frame(struct task_struct *tsk,
> struct stackframe *frame)
>         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
>  
> +       same_stack = on_same_stack(tsk, fp, frame->fp);
> +
> +       if (fp <= frame->fp && same_stack)
> +               return -EINVAL;
> +       if (!same_stack && ++frame->nr_stacks > MAX_NR_STACKS)
> +               return -EINVAL;
> +
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         if (tsk->ret_stack &&
>                         (frame->pc == (unsigned long)return_to_handler))
> {
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index eb2d151..3b1c472 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -120,6 +120,7 @@ void dump_backtrace(struct pt_regs *regs, struct
> task_struct *tsk)
>                 frame.fp = thread_saved_fp(tsk);
>                 frame.pc = thread_saved_pc(tsk);
>         }
> +       frame.nr_stacks = 1;
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         frame.graph = tsk->curr_ret_stack;
>  #endif
Hi all sirs,

Since the discussion is far away with the original topic of this patch,
I have submitted a new patch: "arm64: avoid potential infinity loop in
dump_backtrace" based on the latest sample code.
We can switch to the new mail for more discussions.

Thanks,
Ji

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2018-04-20  5:43 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-22  3:06 [PATCH] arm64: avoid race condition issue in dump_backtrace Ji Zhang
2018-03-22  3:06 ` Ji Zhang
2018-03-22  3:06 ` Ji Zhang
2018-03-22  4:57 ` Baruch Siach
2018-03-22  4:57   ` Baruch Siach
     [not found]   ` <20180322045710.vmmjr2wcankea45o-MwjkAAnuF3khR1HGirfZ1z4kX+cae0hd@public.gmane.org>
2018-03-22  5:33     ` Ji.Zhang
2018-03-22  5:33       ` Ji.Zhang
2018-03-22  5:59 ` Mark Rutland
2018-03-22  5:59   ` Mark Rutland
2018-03-22  9:35   ` Ji.Zhang
2018-03-22  9:35     ` Ji.Zhang
2018-03-26 11:39     ` Mark Rutland
2018-03-26 11:39       ` Mark Rutland
     [not found]       ` <20180326113932.2i6qp3776jtmcqk4-agMKViyK24J5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2018-03-28  9:33         ` Ji.Zhang
2018-03-28  9:33           ` Ji.Zhang
2018-03-28 10:12           ` Mark Rutland
2018-03-28 10:12             ` Mark Rutland
     [not found]             ` <20180328101240.moo44g5qd3qjuxgn-agMKViyK24J5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2018-03-30  8:08               ` Ji.Zhang
2018-03-30  8:08                 ` Ji.Zhang
2018-04-04  9:04                 ` Mark Rutland
2018-04-04  9:04                   ` Mark Rutland
     [not found]                   ` <20180404090431.rqwtaqovipxa5gta-agMKViyK24J5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2018-04-08  7:58                     ` Ji.Zhang
2018-04-08  7:58                       ` Ji.Zhang
2018-04-09 11:26                       ` Mark Rutland
2018-04-09 11:26                         ` Mark Rutland
     [not found]                         ` <20180409112559.uh76jpiytznymw6w-agMKViyK24J5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2018-04-11  6:30                           ` Ji.Zhang
2018-04-11  6:30                             ` Ji.Zhang
2018-04-11 10:46                             ` Mark Rutland
2018-04-11 10:46                               ` Mark Rutland
     [not found]                               ` <20180411104656.4afhb4durpntxqxl-agMKViyK24J5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2018-04-12  6:13                                 ` Ji.Zhang
2018-04-12  6:13                                   ` Ji.Zhang
2018-04-20  5:43                                   ` Ji.Zhang
2018-04-20  5:43                                     ` Ji.Zhang
2018-03-24 11:01 ` kbuild test robot
2018-03-24 11:01   ` kbuild test robot
2018-03-24 11:01   ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.