From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751964AbbJLOxM (ORCPT ); Mon, 12 Oct 2015 10:53:12 -0400 Received: from mail-pa0-f42.google.com ([209.85.220.42]:33080 "EHLO mail-pa0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751630AbbJLOxJ convert rfc822-to-8bit (ORCPT ); Mon, 12 Oct 2015 10:53:09 -0400 Subject: Re: [PATCH v4 2/2] arm64: Expand the stack trace feature to support IRQ stack Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=windows-1252 From: Jungseok Lee In-Reply-To: <5617CE26.10604@arm.com> Date: Mon, 12 Oct 2015 23:53:04 +0900 Cc: takahiro.akashi@linaro.org, catalin.marinas@arm.com, will.deacon@arm.com, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, barami97@gmail.com, linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8BIT Message-Id: <07A53E87-C562-48D1-86DF-A373EAAA73F9@gmail.com> References: <1444231692-32722-1-git-send-email-jungseoklee85@gmail.com> <1444231692-32722-3-git-send-email-jungseoklee85@gmail.com> <5617CE26.10604@arm.com> To: James Morse X-Mailer: Apple Mail (2.1283) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Oct 9, 2015, at 11:24 PM, James Morse wrote: > Hi Jungseok, Hi James, > On 07/10/15 16:28, Jungseok Lee wrote: >> Currently, a call trace drops a process stack walk when a separate IRQ >> stack is used. It makes a call trace information much less useful when >> a system gets paniked in interrupt context. > > panicked I will fix the typo. >> This patch addresses the issue with the following schemes: >> >> - Store aborted stack frame data >> - Decide whether another stack walk is needed or not via current sp >> - Loosen the frame pointer upper bound condition > > It may be worth merging this patch with its predecessor - anyone trying to > bisect a problem could land between these two patches, and spend time > debugging the truncated call traces. It was an original intention to lead them to this patch, not the [1/2] one. This separation would help anyone touching the call trace feature including me focus on these changes apart from stack allocation, IRQ recursion check and thread_info management. In addition, I would like to add a clear and sufficient explanation on the frame pointer condition. >> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h >> index 6ea82e8..e5904a1 100644 >> --- a/arch/arm64/include/asm/irq.h >> +++ b/arch/arm64/include/asm/irq.h >> @@ -2,13 +2,25 @@ >> #define __ASM_IRQ_H >> >> #include >> +#include >> >> #include >> >> struct irq_stack { >> void *stack; >> + struct stackframe frame; >> }; >> >> +DECLARE_PER_CPU(struct irq_stack, irq_stacks); > > Good idea, storing this in the per-cpu data makes it immune to stack > corruption. > > >> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c >> index 407991b..5124649 100644 >> --- a/arch/arm64/kernel/stacktrace.c >> +++ b/arch/arm64/kernel/stacktrace.c >> @@ -43,7 +43,27 @@ int notrace unwind_frame(struct stackframe *frame) >> low = frame->sp; >> high = ALIGN(low, THREAD_SIZE); >> >> - if (fp < low || fp > high - 0x18 || fp & 0xf) >> + /* >> + * A frame pointer would reach an upper bound if a prologue of the >> + * first function of call trace looks as follows: >> + * >> + * stp x29, x30, [sp,#-16]! >> + * mov x29, sp >> + * >> + * Thus, the upper bound is (top of stack - 0x20) with consideration > > The terms 'top' and 'bottom' of the stack are confusing, your 'top' appears > to be the highest address, which is used first, making it the bottom of the > stack. > > I would try to use the terms low/est and high/est, in keeping with the > variable names in use here. Good idea. I'm favor of those terms. >> + * of a 16-byte empty space in THREAD_START_SP. >> + * >> + * The value, 0x20, however, does not cover all cases as interrupts >> + * are handled using a separate stack. That is, a call trace can start >> + * from elx_irq exception vectors. The symbols could not be promoted >> + * to candidates for a stack trace under the restriction, 0x20. >> + * >> + * The scenario is handled without complexity as 1) considering >> + * (bottom of stack + THREAD_START_SP) as a dummy frame pointer, the >> + * content of which is 0, and 2) allowing the case, which changes >> + * the value to 0x10 from 0x20. > > Where has 0x20 come from? The old value was 0x18. What I meant is 0x20 is the highest valid frame pointer. The comment should have been described more clearly. > My understanding is the highest part of the stack looks like this: > high [ off-stack ] > high - 0x08 [ left free by THREAD_START_SP ] > high - 0x10 [ left free by THREAD_START_SP ] > high - 0x18 [#1 x30 ] > high - 0x20 [#1 x29 ] Clear description than mine! > So the condition 'fp > high - 0x18' prevents returning either 'left free' > address, or off-stack-value as a frame. Changing it to 'fp > high - 0x10' > allows the first half of that reserved area to be a valid stack frame. I believe my understanding is aligned with yours. Under a current condition, 'fp > high - 0x18', it is impossible to catch the 'el1_irq' symbol. This is why I set x29 to high - 0x10 and changed the frame pointer condition, but the changes fail to cover perf according to your data. > This change is breaking perf using incantations [0] and [1]: I'm reviewing how perf stack trace works.. > Before, with just patch 1/2: > ---__do_softirq > | > |--92.95%-- __handle_domain_irq > | __irqentry_text_start > | el1_irq > | > > After, with both patches: > ---__do_softirq > | > |--83.83%-- __handle_domain_irq > | __irqentry_text_start > | el1_irq > | | > | |--99.39%-- 0x400008040d00000c > | --0.61%-- [...] > | > > Changing the condition to 'fp >= high - 0x10' fixes this. 'fp >= high - 0x10' drops 'el1_irq' when dump_stack() or panic() is called. > I agree it needs documenting, it is quite fiddly - I think Akashi Takahiro > is the expert. If possible, it would be greatly helpful. > I think unwind_frame() needs to walk the irq stack too. [2] is an example > of perf tracing back to userspace, (and there are patches on the list to > do/fix this), so we need to walk back to the start of the first stack for > the perf accounting to be correct. Frankly, I missed the case where perf does backtrace to userspace. IMO, this statement supports why the stack trace feature commit should be written independently. The [1/2] patch would be pretty stable if 64KB page is supported. The separation might help us concentrate on the stack trace feature in a generic dump stack, perf, and ftrace point of view. >> + */ >> + if (fp < low || fp > high - 0x10 || fp & 0xf) >> return -EINVAL; >> >> frame->sp = fp + 0x10; >> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c >> index f93aae5..44b2f828 100644 >> --- a/arch/arm64/kernel/traps.c >> +++ b/arch/arm64/kernel/traps.c >> @@ -146,6 +146,8 @@ static void dump_instr(const char *lvl, struct pt_regs *regs) >> static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) >> { >> struct stackframe frame; >> + unsigned int cpu = smp_processor_id(); > > I wonder if there is any case where dump_backtrace() is called on another cpu? > > Setting the cpu value from task_thread_info(tsk)->cpu would protect against > this. IMO, no, but your suggestion makes sense. I will update it. >> + bool in_irq = in_irq_stack(cpu); >> >> pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk); >> >> @@ -170,6 +172,10 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) >> } >> >> pr_emerg("Call trace:\n"); >> +repeat: >> + if (in_irq) >> + pr_emerg("\n"); > > Do we need these? 'el1_irq()' in the trace is a giveaway… I borrow this idea from x86 implementation in order to show a separate stack explicitly. There is no issue to remove these tags, and . Great thanks! Best Regards Jungseok Lee From mboxrd@z Thu Jan 1 00:00:00 1970 From: jungseoklee85@gmail.com (Jungseok Lee) Date: Mon, 12 Oct 2015 23:53:04 +0900 Subject: [PATCH v4 2/2] arm64: Expand the stack trace feature to support IRQ stack In-Reply-To: <5617CE26.10604@arm.com> References: <1444231692-32722-1-git-send-email-jungseoklee85@gmail.com> <1444231692-32722-3-git-send-email-jungseoklee85@gmail.com> <5617CE26.10604@arm.com> Message-ID: <07A53E87-C562-48D1-86DF-A373EAAA73F9@gmail.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Oct 9, 2015, at 11:24 PM, James Morse wrote: > Hi Jungseok, Hi James, > On 07/10/15 16:28, Jungseok Lee wrote: >> Currently, a call trace drops a process stack walk when a separate IRQ >> stack is used. It makes a call trace information much less useful when >> a system gets paniked in interrupt context. > > panicked I will fix the typo. >> This patch addresses the issue with the following schemes: >> >> - Store aborted stack frame data >> - Decide whether another stack walk is needed or not via current sp >> - Loosen the frame pointer upper bound condition > > It may be worth merging this patch with its predecessor - anyone trying to > bisect a problem could land between these two patches, and spend time > debugging the truncated call traces. It was an original intention to lead them to this patch, not the [1/2] one. This separation would help anyone touching the call trace feature including me focus on these changes apart from stack allocation, IRQ recursion check and thread_info management. In addition, I would like to add a clear and sufficient explanation on the frame pointer condition. >> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h >> index 6ea82e8..e5904a1 100644 >> --- a/arch/arm64/include/asm/irq.h >> +++ b/arch/arm64/include/asm/irq.h >> @@ -2,13 +2,25 @@ >> #define __ASM_IRQ_H >> >> #include >> +#include >> >> #include >> >> struct irq_stack { >> void *stack; >> + struct stackframe frame; >> }; >> >> +DECLARE_PER_CPU(struct irq_stack, irq_stacks); > > Good idea, storing this in the per-cpu data makes it immune to stack > corruption. > > >> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c >> index 407991b..5124649 100644 >> --- a/arch/arm64/kernel/stacktrace.c >> +++ b/arch/arm64/kernel/stacktrace.c >> @@ -43,7 +43,27 @@ int notrace unwind_frame(struct stackframe *frame) >> low = frame->sp; >> high = ALIGN(low, THREAD_SIZE); >> >> - if (fp < low || fp > high - 0x18 || fp & 0xf) >> + /* >> + * A frame pointer would reach an upper bound if a prologue of the >> + * first function of call trace looks as follows: >> + * >> + * stp x29, x30, [sp,#-16]! >> + * mov x29, sp >> + * >> + * Thus, the upper bound is (top of stack - 0x20) with consideration > > The terms 'top' and 'bottom' of the stack are confusing, your 'top' appears > to be the highest address, which is used first, making it the bottom of the > stack. > > I would try to use the terms low/est and high/est, in keeping with the > variable names in use here. Good idea. I'm favor of those terms. >> + * of a 16-byte empty space in THREAD_START_SP. >> + * >> + * The value, 0x20, however, does not cover all cases as interrupts >> + * are handled using a separate stack. That is, a call trace can start >> + * from elx_irq exception vectors. The symbols could not be promoted >> + * to candidates for a stack trace under the restriction, 0x20. >> + * >> + * The scenario is handled without complexity as 1) considering >> + * (bottom of stack + THREAD_START_SP) as a dummy frame pointer, the >> + * content of which is 0, and 2) allowing the case, which changes >> + * the value to 0x10 from 0x20. > > Where has 0x20 come from? The old value was 0x18. What I meant is 0x20 is the highest valid frame pointer. The comment should have been described more clearly. > My understanding is the highest part of the stack looks like this: > high [ off-stack ] > high - 0x08 [ left free by THREAD_START_SP ] > high - 0x10 [ left free by THREAD_START_SP ] > high - 0x18 [#1 x30 ] > high - 0x20 [#1 x29 ] Clear description than mine! > So the condition 'fp > high - 0x18' prevents returning either 'left free' > address, or off-stack-value as a frame. Changing it to 'fp > high - 0x10' > allows the first half of that reserved area to be a valid stack frame. I believe my understanding is aligned with yours. Under a current condition, 'fp > high - 0x18', it is impossible to catch the 'el1_irq' symbol. This is why I set x29 to high - 0x10 and changed the frame pointer condition, but the changes fail to cover perf according to your data. > This change is breaking perf using incantations [0] and [1]: I'm reviewing how perf stack trace works.. > Before, with just patch 1/2: > ---__do_softirq > | > |--92.95%-- __handle_domain_irq > | __irqentry_text_start > | el1_irq > | > > After, with both patches: > ---__do_softirq > | > |--83.83%-- __handle_domain_irq > | __irqentry_text_start > | el1_irq > | | > | |--99.39%-- 0x400008040d00000c > | --0.61%-- [...] > | > > Changing the condition to 'fp >= high - 0x10' fixes this. 'fp >= high - 0x10' drops 'el1_irq' when dump_stack() or panic() is called. > I agree it needs documenting, it is quite fiddly - I think Akashi Takahiro > is the expert. If possible, it would be greatly helpful. > I think unwind_frame() needs to walk the irq stack too. [2] is an example > of perf tracing back to userspace, (and there are patches on the list to > do/fix this), so we need to walk back to the start of the first stack for > the perf accounting to be correct. Frankly, I missed the case where perf does backtrace to userspace. IMO, this statement supports why the stack trace feature commit should be written independently. The [1/2] patch would be pretty stable if 64KB page is supported. The separation might help us concentrate on the stack trace feature in a generic dump stack, perf, and ftrace point of view. >> + */ >> + if (fp < low || fp > high - 0x10 || fp & 0xf) >> return -EINVAL; >> >> frame->sp = fp + 0x10; >> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c >> index f93aae5..44b2f828 100644 >> --- a/arch/arm64/kernel/traps.c >> +++ b/arch/arm64/kernel/traps.c >> @@ -146,6 +146,8 @@ static void dump_instr(const char *lvl, struct pt_regs *regs) >> static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) >> { >> struct stackframe frame; >> + unsigned int cpu = smp_processor_id(); > > I wonder if there is any case where dump_backtrace() is called on another cpu? > > Setting the cpu value from task_thread_info(tsk)->cpu would protect against > this. IMO, no, but your suggestion makes sense. I will update it. >> + bool in_irq = in_irq_stack(cpu); >> >> pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk); >> >> @@ -170,6 +172,10 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) >> } >> >> pr_emerg("Call trace:\n"); >> +repeat: >> + if (in_irq) >> + pr_emerg("\n"); > > Do we need these? 'el1_irq()' in the trace is a giveaway? I borrow this idea from x86 implementation in order to show a separate stack explicitly. There is no issue to remove these tags, and . Great thanks! Best Regards Jungseok Lee