linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated
@ 2022-04-12  7:40 Dmitry Monakhov
  2022-04-12  7:40 ` [PATCH 2/2] x86/unwind/orc: Fix address check size for deref_stack_iret_regs Dmitry Monakhov
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Dmitry Monakhov @ 2022-04-12  7:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, mingo, Dmitry Monakhov

get_stack_info() detects stack type only by begin address, so we must
check that address range in question is fully covered by detected stack

Otherwise following crash is possible:
-> unwind_next_frame
   case ORC_TYPE_REGS:
     if (!deref_stack_regs(state, sp, &state->ip, &state->sp))
     -> deref_stack_regs
       -> stack_access_ok  <- here addr is inside stack range, but addr+len-1 is not, but we still exit with success
     *ip = READ_ONCE_NOCHECK(regs->ip); <- Here we hit stack guard fault
OOPS LOG:
<0>[ 1941.845743] BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0)
<4>[ 1941.845744] kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
<4>[ 1941.845744] CPU: 93 PID: 23787 Comm: context_switch1 Not tainted 5.4.145 #1
<4>[ 1941.845745] Hardware name: XXXXXXXXXXXXXX
<4>[ 1941.845746] RIP: 0010:unwind_next_frame+0x311/0x5b0
<4>[ 1941.845746] Code: 01 0f 84 f6 01 00 00 0f 83 cc fe ff ff e9 f9 fe ff ff ba a8 00 00 00 4c 89 fe 48 89 df e8 57 fa ff ff 84 c0 0f 84 d9 00 00 00 <49> 8b 87 80 00 00 00 48 89 43 48 49 8b 87 98 00 00 00 4c 89 7b 50
<4>[ 1941.845747] RSP: 0018:fffffe00012908f0 EFLAGS: 00010002
<4>[ 1941.845748] RAX: 0000000000000001 RBX: fffffe0001290930 RCX: 0000000000000001
<4>[ 1941.845748] RDX: 0000000000000002 RSI: ffff893d94719e80 RDI: ffffc9001b9d7fc8
<4>[ 1941.845748] RBP: 0000000000000004 R08: ffffffff81a009bf R09: ffffffff824c7c20
<4>[ 1941.845749] R10: ffffffff824c7c1c R11: 0000000000000023 R12: ffffffff826c9cde
<4>[ 1941.845749] R13: ffffffff81a009d2 R14: fffffe0001289ff0 R15: ffffc9001b9d7fc8
<4>[ 1941.845749] FS:  00007f57ff9aa700(0000) GS:ffff893f6f140000(0000) knlGS:0000000000000000
<4>[ 1941.845750] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 1941.845750] CR2: ffffc9001b9d8048 CR3: 000000bcd3936000 CR4: 0000000000340ee0
<4>[ 1941.845750] Call Trace:
<4>[ 1941.845750]  <NMI>
<4>[ 1941.845751]  perf_callchain_kernel+0x12b/0x140
<4>[ 1941.845751]  ? interrupt_entry+0xb2/0xc0
<4>[ 1941.845751]  get_perf_callchain+0x10d/0x280
<4>[ 1941.845751]  perf_callchain+0x6e/0x80
<4>[ 1941.845752]  perf_prepare_sample+0x87/0x540
<4>[ 1941.845752]  perf_event_output_forward+0x31/0x90
<4>[ 1941.845752]  ? sched_clock+0x5/0x10
<4>[ 1941.845752]  ? sched_clock_cpu+0xc/0xb0
<4>[ 1941.845753]  ? arch_perf_update_userpage+0xd0/0xe0
<4>[ 1941.845753]  ? sched_clock+0x5/0x10
<4>[ 1941.845753]  ? sched_clock_cpu+0xc/0xb0
<4>[ 1941.845753]  __perf_event_overflow+0x5a/0xf0
<4>[ 1941.845754]  perf_ibs_handle_irq+0x340/0x5b0
<4>[ 1941.845754]  ? interrupt_entry+0xb2/0xc0
<4>[ 1941.845754]  ? interrupt_entry+0xb2/0xc0
<4>[ 1941.845754]  ? __set_pte_vaddr+0x28/0x40
<4>[ 1941.845755]  ? __set_pte_vaddr+0x28/0x40
<4>[ 1941.845755]  ? set_pte_vaddr+0x3c/0x60
<4>[ 1941.845755]  ? __native_set_fixmap+0x24/0x30
<4>[ 1941.845755]  ? native_set_fixmap+0x3c/0xb0
<4>[ 1941.845756]  ? ghes_copy_tofrom_phys+0x98/0x130
<4>[ 1941.845756]  ? interrupt_entry+0xb2/0xc0
<4>[ 1941.845756]  ? __ghes_peek_estatus.isra.16+0x49/0xb0
<4>[ 1941.845756]  ? perf_ibs_nmi_handler+0x34/0x60
<4>[ 1941.845757]  ? sched_clock+0x5/0x10
<4>[ 1941.845757]  perf_ibs_nmi_handler+0x34/0x60
<4>[ 1941.845757]  nmi_handle+0x79/0x190
<4>[ 1941.845757]  default_do_nmi+0x3e/0x110
<4>[ 1941.845758]  do_nmi+0x18d/0x1e0
<4>[ 1941.845758]  end_repeat_nmi+0x16/0x50
<4>[ 1941.845758] RIP: 0010:syscall_return_via_sysret+0x26/0x7f
<4>[ 1941.845759] Code: 3e 09 00 00 41 5f 41 5e 41 5d 41 5c 5d 5b 5e 41 5a 41 59 41 58 58 5e 5a 5e 48 89 e7 65 48 8b 24 25 04 60 00 00 ff 77 28 ff 37 <50> eb 43 0f 20 df eb 34 48 89 f8 48 81 e7 ff 07 00 00 65 48 0f a3
<4>[ 1941.845759] RSP: 0018:fffffe0001289ff0 EFLAGS: 00000046
<4>[ 1941.845760] RAX: 0000000000000001 RBX: 0000559a21ff71d0 RCX: 00007f57ff286260
<4>[ 1941.845760] RDX: 0000000000000001 RSI: 00007ffe43dc3d17 RDI: ffffc9001b9d7fc8
<4>[ 1941.845760] RBP: 00007ffe43dc3d17 R08: 0000000000005ceb R09: 00007f57ff9aa700
<4>[ 1941.845761] R10: 00007f57ff9aa9d0 R11: 0000000000000246 R12: 00007f57ff9aa698
<4>[ 1941.845761] R13: 00007f57ff9b5d00 R14: 0000559a1d5bb070 R15: 0000000000000005
<4>[ 1941.845761]  ? syscall_return_via_sysret+0x26/0x7f
<4>[ 1941.845761]  ? syscall_return_via_sysret+0x26/0x7f
<4>[ 1941.845762]  </NMI>
<4>[ 1941.845762]  <ENTRY_TRAMPOLINE>
<4>[ 1941.845762]  </ENTRY_TRAMPOLINE>
<4>[ 1941.845762] Modules linked in: unix_diag ip6table_filter sch_fq_codel tcp_diag inet_diag act_police act_gact cls_u32 sch_ingress netconsole configfs 8021q garp stp mrp llc amd64_edac_mod edac_mce_amd ghash_clmulni_intel ipmi_ssif ipmi_si ipmi_devintf ipmi_msghandler sp5100_tco i2c_piix4 k10temp tcp_htcp tcp_bbr kvm_amd ccp kvm irqbypass mpls_gso mpls_iptunnel mpls_router fou6 fou ip_tunnel ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core dummy msr ip6_tables ip_tables x_tables ip6_tunnel tunnel6 nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast drm_vram_helper i2c_algo_bit mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw ttm pci_hyperv_intf tls drm

Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
---
 arch/x86/kernel/unwind_orc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 794fdef2501a..80b878772b86 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -343,7 +343,9 @@ static bool stack_access_ok(struct unwind_state *state, unsigned long _addr,
 	    (get_stack_info(addr, state->task, info, &state->stack_mask)))
 		return false;
 
-	return true;
+	/* Recheck range after stack info was updated */
+	return on_stack(info, addr, len);
+
 }
 
 static bool deref_stack_reg(struct unwind_state *state, unsigned long addr,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] x86/unwind/orc: Fix address check size for deref_stack_iret_regs
  2022-04-12  7:40 [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Dmitry Monakhov
@ 2022-04-12  7:40 ` Dmitry Monakhov
  2022-04-12 10:01   ` Peter Zijlstra
  2022-04-12 10:08 ` [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Dmitry Monakhov @ 2022-04-12  7:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, mingo, Dmitry Monakhov

For historical reasons we check only IRET_FRAME_OFFSET, but this check
is no longer valid because  we also access regs->sp field which is
located beyond IRET_FRAME, so it is reasonable to validate full structure.

Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
---
 arch/x86/kernel/unwind_orc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 80b878772b86..a249ecabe689 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -379,7 +379,7 @@ static bool deref_stack_iret_regs(struct unwind_state *state, unsigned long addr
 {
 	struct pt_regs *regs = (void *)addr - IRET_FRAME_OFFSET;
 
-	if (!stack_access_ok(state, addr, IRET_FRAME_SIZE))
+	if (!stack_access_ok(state, addr, sizeof(struct pt_regs)))
 		return false;
 
 	*ip = READ_ONCE_NOCHECK(regs->ip);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] x86/unwind/orc: Fix address check size for deref_stack_iret_regs
  2022-04-12  7:40 ` [PATCH 2/2] x86/unwind/orc: Fix address check size for deref_stack_iret_regs Dmitry Monakhov
@ 2022-04-12 10:01   ` Peter Zijlstra
  2022-04-12 10:57     ` Dmitry Monakhov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2022-04-12 10:01 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-kernel, x86, mingo

On Tue, Apr 12, 2022 at 10:40:04AM +0300, Dmitry Monakhov wrote:
> For historical reasons we check only IRET_FRAME_OFFSET, but this check
> is no longer valid because  we also access regs->sp field which is
> located beyond IRET_FRAME, so it is reasonable to validate full structure.

Uuuh, what? IRET frame is:

 ss, sp, flags, cs, ip

that very much includes sp.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated
  2022-04-12  7:40 [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Dmitry Monakhov
  2022-04-12  7:40 ` [PATCH 2/2] x86/unwind/orc: Fix address check size for deref_stack_iret_regs Dmitry Monakhov
@ 2022-04-12 10:08 ` Peter Zijlstra
  2022-04-16  0:49   ` Josh Poimboeuf
  2022-04-29  5:14   ` [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding Ravi Bangoria
  2022-04-12 10:11 ` [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Peter Zijlstra
  2022-04-16  0:46 ` Josh Poimboeuf
  3 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2022-04-12 10:08 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-kernel, x86, mingo, kim.phillips, Josh Poimboeuf

On Tue, Apr 12, 2022 at 10:40:03AM +0300, Dmitry Monakhov wrote:
> get_stack_info() detects stack type only by begin address, so we must
> check that address range in question is fully covered by detected stack
> 
> Otherwise following crash is possible:
> -> unwind_next_frame
>    case ORC_TYPE_REGS:
>      if (!deref_stack_regs(state, sp, &state->ip, &state->sp))
>      -> deref_stack_regs
>        -> stack_access_ok  <- here addr is inside stack range, but addr+len-1 is not, but we still exit with success
>      *ip = READ_ONCE_NOCHECK(regs->ip); <- Here we hit stack guard fault
> OOPS LOG:
> <0>[ 1941.845743] BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0)


> <4>[ 1941.845751]  get_perf_callchain+0x10d/0x280
> <4>[ 1941.845751]  perf_callchain+0x6e/0x80
> <4>[ 1941.845752]  perf_prepare_sample+0x87/0x540
> <4>[ 1941.845752]  perf_event_output_forward+0x31/0x90
> <4>[ 1941.845753]  __perf_event_overflow+0x5a/0xf0
> <4>[ 1941.845754]  perf_ibs_handle_irq+0x340/0x5b0
> <4>[ 1941.845757]  perf_ibs_nmi_handler+0x34/0x60
> <4>[ 1941.845757]  nmi_handle+0x79/0x190

Urgh, this is another instance of trying to unwind an IP that no longer
matches the stack.

Fixing the unwinder bug is good, but arguable we should also fix this
IBS stuff, see 6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated
  2022-04-12  7:40 [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Dmitry Monakhov
  2022-04-12  7:40 ` [PATCH 2/2] x86/unwind/orc: Fix address check size for deref_stack_iret_regs Dmitry Monakhov
  2022-04-12 10:08 ` [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Peter Zijlstra
@ 2022-04-12 10:11 ` Peter Zijlstra
  2022-04-14 15:18   ` Josh Poimboeuf
  2022-04-16  0:46 ` Josh Poimboeuf
  3 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2022-04-12 10:11 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-kernel, x86, mingo, Josh Poimboeuf


Also, guys, do we want something like so?

diff --git a/MAINTAINERS b/MAINTAINERS
index fd768d43e048..cb5fcb2a9cef 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14220,6 +14220,7 @@ M:	Peter Zijlstra <peterz@infradead.org>
 S:	Supported
 F:	tools/objtool/
 F:	include/linux/objtool.h
+F:	arch/x86/kernel/unwind*
 
 OCELOT ETHERNET SWITCH DRIVER
 M:	Vladimir Oltean <vladimir.oltean@nxp.com>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] x86/unwind/orc: Fix address check size for deref_stack_iret_regs
  2022-04-12 10:01   ` Peter Zijlstra
@ 2022-04-12 10:57     ` Dmitry Monakhov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Monakhov @ 2022-04-12 10:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, x86, mingo



> On Tue, Apr 12, 2022 at 10:40:04AM +0300, Dmitry Monakhov wrote:
> 
>> For historical reasons we check only IRET_FRAME_OFFSET, but this check
>> is no longer valid because we also access regs->sp field which is
>> located beyond IRET_FRAME, so it is reasonable to validate full structure.
> 
> Uuuh, what? IRET frame is:
> 
> ss, sp, flags, cs, ip
> 
> that very much includes sp.
Oh. Indeed you are right. Sorry. Please ignore this patch.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated
  2022-04-12 10:11 ` [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Peter Zijlstra
@ 2022-04-14 15:18   ` Josh Poimboeuf
  0 siblings, 0 replies; 14+ messages in thread
From: Josh Poimboeuf @ 2022-04-14 15:18 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Dmitry Monakhov, linux-kernel, x86, mingo

On Tue, Apr 12, 2022 at 12:11:58PM +0200, Peter Zijlstra wrote:
> 
> Also, guys, do we want something like so?
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index fd768d43e048..cb5fcb2a9cef 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14220,6 +14220,7 @@ M:	Peter Zijlstra <peterz@infradead.org>
>  S:	Supported
>  F:	tools/objtool/
>  F:	include/linux/objtool.h
> +F:	arch/x86/kernel/unwind*
>  
>  OCELOT ETHERNET SWITCH DRIVER
>  M:	Vladimir Oltean <vladimir.oltean@nxp.com>

I could have sworn we already had a separate MAINTAINERS entry for the
x86 unwinder, senility must be rapidly approaching.

-- 
Josh


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated
  2022-04-12  7:40 [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Dmitry Monakhov
                   ` (2 preceding siblings ...)
  2022-04-12 10:11 ` [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Peter Zijlstra
@ 2022-04-16  0:46 ` Josh Poimboeuf
  3 siblings, 0 replies; 14+ messages in thread
From: Josh Poimboeuf @ 2022-04-16  0:46 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-kernel, x86, mingo

On Tue, Apr 12, 2022 at 10:40:03AM +0300, Dmitry Monakhov wrote:
> get_stack_info() detects stack type only by begin address, so we must
> check that address range in question is fully covered by detected stack
> 
> Otherwise following crash is possible:
> -> unwind_next_frame
>    case ORC_TYPE_REGS:
>      if (!deref_stack_regs(state, sp, &state->ip, &state->sp))
>      -> deref_stack_regs
>        -> stack_access_ok  <- here addr is inside stack range, but addr+len-1 is not, but we still exit with success
>      *ip = READ_ONCE_NOCHECK(regs->ip); <- Here we hit stack guard fault

Hi Dmitry,

Thanks for the patch.

As Peter mentioned, the root cause of the crash was that the stack was
changing while the unwinder was reading it.  But the patch is still
valid since the unwinder needs to be paranoid.

The commit log should have that background.

> OOPS LOG:
> <0>[ 1941.845743] BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0)
> <4>[ 1941.845744] kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
> <4>[ 1941.845744] CPU: 93 PID: 23787 Comm: context_switch1 Not tainted 5.4.145 #1
> <4>[ 1941.845745] Hardware name: XXXXXXXXXXXXXX
> <4>[ 1941.845746] RIP: 0010:unwind_next_frame+0x311/0x5b0
> <4>[ 1941.845746] Code: 01 0f 84 f6 01 00 00 0f 83 cc fe ff ff e9 f9 fe ff ff ba a8 00 00 00 4c 89 fe 48 89 df e8 57 fa ff ff 84 c0 0f 84 d9 00 00 00 <49> 8b 87 80 00 00 00 48 89 43 48 49 8b 87 98 00 00 00 4c 89 7b 50
> <4>[ 1941.845747] RSP: 0018:fffffe00012908f0 EFLAGS: 00010002
> <4>[ 1941.845748] RAX: 0000000000000001 RBX: fffffe0001290930 RCX: 0000000000000001
> <4>[ 1941.845748] RDX: 0000000000000002 RSI: ffff893d94719e80 RDI: ffffc9001b9d7fc8
> <4>[ 1941.845748] RBP: 0000000000000004 R08: ffffffff81a009bf R09: ffffffff824c7c20
> <4>[ 1941.845749] R10: ffffffff824c7c1c R11: 0000000000000023 R12: ffffffff826c9cde
> <4>[ 1941.845749] R13: ffffffff81a009d2 R14: fffffe0001289ff0 R15: ffffc9001b9d7fc8
> <4>[ 1941.845749] FS:  00007f57ff9aa700(0000) GS:ffff893f6f140000(0000) knlGS:0000000000000000
> <4>[ 1941.845750] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[ 1941.845750] CR2: ffffc9001b9d8048 CR3: 000000bcd3936000 CR4: 0000000000340ee0
> <4>[ 1941.845750] Call Trace:
> <4>[ 1941.845750]  <NMI>
> <4>[ 1941.845751]  perf_callchain_kernel+0x12b/0x140
> <4>[ 1941.845751]  ? interrupt_entry+0xb2/0xc0
> <4>[ 1941.845751]  get_perf_callchain+0x10d/0x280
> <4>[ 1941.845751]  perf_callchain+0x6e/0x80
> <4>[ 1941.845752]  perf_prepare_sample+0x87/0x540
> <4>[ 1941.845752]  perf_event_output_forward+0x31/0x90
> <4>[ 1941.845752]  ? sched_clock+0x5/0x10
> <4>[ 1941.845752]  ? sched_clock_cpu+0xc/0xb0
> <4>[ 1941.845753]  ? arch_perf_update_userpage+0xd0/0xe0
> <4>[ 1941.845753]  ? sched_clock+0x5/0x10
> <4>[ 1941.845753]  ? sched_clock_cpu+0xc/0xb0
> <4>[ 1941.845753]  __perf_event_overflow+0x5a/0xf0
> <4>[ 1941.845754]  perf_ibs_handle_irq+0x340/0x5b0
> <4>[ 1941.845754]  ? interrupt_entry+0xb2/0xc0
> <4>[ 1941.845754]  ? interrupt_entry+0xb2/0xc0
> <4>[ 1941.845754]  ? __set_pte_vaddr+0x28/0x40
> <4>[ 1941.845755]  ? __set_pte_vaddr+0x28/0x40
> <4>[ 1941.845755]  ? set_pte_vaddr+0x3c/0x60
> <4>[ 1941.845755]  ? __native_set_fixmap+0x24/0x30
> <4>[ 1941.845755]  ? native_set_fixmap+0x3c/0xb0
> <4>[ 1941.845756]  ? ghes_copy_tofrom_phys+0x98/0x130
> <4>[ 1941.845756]  ? interrupt_entry+0xb2/0xc0
> <4>[ 1941.845756]  ? __ghes_peek_estatus.isra.16+0x49/0xb0
> <4>[ 1941.845756]  ? perf_ibs_nmi_handler+0x34/0x60
> <4>[ 1941.845757]  ? sched_clock+0x5/0x10
> <4>[ 1941.845757]  perf_ibs_nmi_handler+0x34/0x60
> <4>[ 1941.845757]  nmi_handle+0x79/0x190
> <4>[ 1941.845757]  default_do_nmi+0x3e/0x110
> <4>[ 1941.845758]  do_nmi+0x18d/0x1e0
> <4>[ 1941.845758]  end_repeat_nmi+0x16/0x50
> <4>[ 1941.845758] RIP: 0010:syscall_return_via_sysret+0x26/0x7f
> <4>[ 1941.845759] Code: 3e 09 00 00 41 5f 41 5e 41 5d 41 5c 5d 5b 5e 41 5a 41 59 41 58 58 5e 5a 5e 48 89 e7 65 48 8b 24 25 04 60 00 00 ff 77 28 ff 37 <50> eb 43 0f 20 df eb 34 48 89 f8 48 81 e7 ff 07 00 00 65 48 0f a3
> <4>[ 1941.845759] RSP: 0018:fffffe0001289ff0 EFLAGS: 00000046
> <4>[ 1941.845760] RAX: 0000000000000001 RBX: 0000559a21ff71d0 RCX: 00007f57ff286260
> <4>[ 1941.845760] RDX: 0000000000000001 RSI: 00007ffe43dc3d17 RDI: ffffc9001b9d7fc8
> <4>[ 1941.845760] RBP: 00007ffe43dc3d17 R08: 0000000000005ceb R09: 00007f57ff9aa700
> <4>[ 1941.845761] R10: 00007f57ff9aa9d0 R11: 0000000000000246 R12: 00007f57ff9aa698
> <4>[ 1941.845761] R13: 00007f57ff9b5d00 R14: 0000559a1d5bb070 R15: 0000000000000005
> <4>[ 1941.845761]  ? syscall_return_via_sysret+0x26/0x7f
> <4>[ 1941.845761]  ? syscall_return_via_sysret+0x26/0x7f
> <4>[ 1941.845762]  </NMI>
> <4>[ 1941.845762]  <ENTRY_TRAMPOLINE>
> <4>[ 1941.845762]  </ENTRY_TRAMPOLINE>
> <4>[ 1941.845762] Modules linked in: unix_diag ip6table_filter sch_fq_codel tcp_diag inet_diag act_police act_gact cls_u32 sch_ingress netconsole configfs 8021q garp stp mrp llc amd64_edac_mod edac_mce_amd ghash_clmulni_intel ipmi_ssif ipmi_si ipmi_devintf ipmi_msghandler sp5100_tco i2c_piix4 k10temp tcp_htcp tcp_bbr kvm_amd ccp kvm irqbypass mpls_gso mpls_iptunnel mpls_router fou6 fou ip_tunnel ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core dummy msr ip6_tables ip_tables x_tables ip6_tunnel tunnel6 nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast drm_vram_helper i2c_algo_bit mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw ttm pci_hyperv_intf tls drm

This oops has way too much detail for the commit log and can be trimmed
down a lot.  See for example the backtraces section in
Documentation/process/submitting-patches.rst.

> Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
> ---
>  arch/x86/kernel/unwind_orc.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index 794fdef2501a..80b878772b86 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -343,7 +343,9 @@ static bool stack_access_ok(struct unwind_state *state, unsigned long _addr,
>  	    (get_stack_info(addr, state->task, info, &state->stack_mask)))
>  		return false;
>  
> -	return true;
> +	/* Recheck range after stack info was updated */
> +	return on_stack(info, addr, len);
> +
>  }

This punishes the typical case where the first on_stack() call
succeeded.  It only needs to be done if the first on_stack() failed.

-- 
Josh


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated
  2022-04-12 10:08 ` [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Peter Zijlstra
@ 2022-04-16  0:49   ` Josh Poimboeuf
  2022-04-29  5:14   ` [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding Ravi Bangoria
  1 sibling, 0 replies; 14+ messages in thread
From: Josh Poimboeuf @ 2022-04-16  0:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Dmitry Monakhov, linux-kernel, x86, mingo, kim.phillips

On Tue, Apr 12, 2022 at 12:08:37PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 12, 2022 at 10:40:03AM +0300, Dmitry Monakhov wrote:
> > get_stack_info() detects stack type only by begin address, so we must
> > check that address range in question is fully covered by detected stack
> > 
> > Otherwise following crash is possible:
> > -> unwind_next_frame
> >    case ORC_TYPE_REGS:
> >      if (!deref_stack_regs(state, sp, &state->ip, &state->sp))
> >      -> deref_stack_regs
> >        -> stack_access_ok  <- here addr is inside stack range, but addr+len-1 is not, but we still exit with success
> >      *ip = READ_ONCE_NOCHECK(regs->ip); <- Here we hit stack guard fault
> > OOPS LOG:
> > <0>[ 1941.845743] BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0)
> 
> 
> > <4>[ 1941.845751]  get_perf_callchain+0x10d/0x280
> > <4>[ 1941.845751]  perf_callchain+0x6e/0x80
> > <4>[ 1941.845752]  perf_prepare_sample+0x87/0x540
> > <4>[ 1941.845752]  perf_event_output_forward+0x31/0x90
> > <4>[ 1941.845753]  __perf_event_overflow+0x5a/0xf0
> > <4>[ 1941.845754]  perf_ibs_handle_irq+0x340/0x5b0
> > <4>[ 1941.845757]  perf_ibs_nmi_handler+0x34/0x60
> > <4>[ 1941.845757]  nmi_handle+0x79/0x190
> 
> Urgh, this is another instance of trying to unwind an IP that no longer
> matches the stack.
> 
> Fixing the unwinder bug is good, but arguable we should also fix this
> IBS stuff, see 6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")

I remember that nastiness well.  So it's still broken?  Or is this a
regression?  Maybe we wouldn't notice it except for this triggered
unwinder bug?

-- 
Josh


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding
  2022-04-12 10:08 ` [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Peter Zijlstra
  2022-04-16  0:49   ` Josh Poimboeuf
@ 2022-04-29  5:14   ` Ravi Bangoria
  2022-04-29 20:28     ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
                       ` (3 more replies)
  1 sibling, 4 replies; 14+ messages in thread
From: Ravi Bangoria @ 2022-04-29  5:14 UTC (permalink / raw)
  To: peterz, dmtrmonakhov, jpoimboe
  Cc: ravi.bangoria, acme, mingo, mark.rutland, jolsa,
	alexander.shishkin, namhyung, tglx, bp, dave.hansen, hpa, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IbsOpRip is recorded when IBS interrupt is triggered. But there is
a skid from the time IBS interrupt gets triggered to the time the
interrupt is presented to the core. Meanwhile processor would have
moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
recorded as part of the interrupt regs. This causes issues while
unwinding stack using the ORC unwinder as it needs consistent rip,
rsp and rbp. Fix this by using rip from interrupt regs instead of
IbsOpRip for stack unwinding.

Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9739019d4b67..171941043f53 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -304,6 +304,16 @@ static int perf_ibs_init(struct perf_event *event)
 	hwc->config_base = perf_ibs->msr;
 	hwc->config = config;
 
+	/*
+	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+	 * recorded as part of interrupt regs. Thus we need to use rip from
+	 * interrupt regs while unwinding call stack. Setting _EARLY flag
+	 * makes sure we unwind call-stack before perf sample rip is set to
+	 * IbsOpRip.
+	 */
+	if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+		event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
+
 	return 0;
 }
 
@@ -687,6 +697,14 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		data.raw = &raw;
 	}
 
+	/*
+	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+	 * recorded as part of interrupt regs. Thus we need to use rip from
+	 * interrupt regs while unwinding call stack.
+	 */
+	if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+		data.callchain = perf_callchain(event, iregs);
+
 	throttle = perf_event_overflow(event, &data, &regs);
 out:
 	if (throttle) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: perf/core] perf/amd/ibs: Use interrupt regs ip for stack unwinding
  2022-04-29  5:14   ` [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding Ravi Bangoria
@ 2022-04-29 20:28     ` tip-bot2 for Ravi Bangoria
  2022-05-02  6:07     ` [PATCH] " Namhyung Kim
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-04-29 20:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Dmitry Monakhov, Peter Zijlstra, Ravi Bangoria, x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     1989455e0196f59741d81601284a6058acfaf225
Gitweb:        https://git.kernel.org/tip/1989455e0196f59741d81601284a6058acfaf225
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Fri, 29 Apr 2022 10:44:41 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 29 Apr 2022 11:06:28 +02:00

perf/amd/ibs: Use interrupt regs ip for stack unwinding

IbsOpRip is recorded when IBS interrupt is triggered. But there is
a skid from the time IBS interrupt gets triggered to the time the
interrupt is presented to the core. Meanwhile processor would have
moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
recorded as part of the interrupt regs. This causes issues while
unwinding stack using the ORC unwinder as it needs consistent rip,
rsp and rbp. Fix this by using rip from interrupt regs instead of
IbsOpRip for stack unwinding.

Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220429051441.14251-1-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9739019..11e8b49 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -304,6 +304,16 @@ static int perf_ibs_init(struct perf_event *event)
 	hwc->config_base = perf_ibs->msr;
 	hwc->config = config;
 
+	/*
+	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+	 * recorded as part of interrupt regs. Thus we need to use rip from
+	 * interrupt regs while unwinding call stack. Setting _EARLY flag
+	 * makes sure we unwind call-stack before perf sample rip is set to
+	 * IbsOpRip.
+	 */
+	if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+		event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
+
 	return 0;
 }
 
@@ -687,6 +697,14 @@ fail:
 		data.raw = &raw;
 	}
 
+	/*
+	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+	 * recorded as part of interrupt regs. Thus we need to use rip from
+	 * interrupt regs while unwinding call stack.
+	 */
+	if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+		data.callchain = perf_callchain(event, iregs);
+
 	throttle = perf_event_overflow(event, &data, &regs);
 out:
 	if (throttle) {

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding
  2022-04-29  5:14   ` [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding Ravi Bangoria
  2022-04-29 20:28     ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
@ 2022-05-02  6:07     ` Namhyung Kim
  2022-05-04  9:23     ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-05-10  9:11     ` tip-bot2 for Ravi Bangoria
  3 siblings, 0 replies; 14+ messages in thread
From: Namhyung Kim @ 2022-05-02  6:07 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Peter Zijlstra, Dmitry Monakhov, Josh Poimboeuf,
	Arnaldo Carvalho de Melo, Ingo Molnar, Mark Rutland, Jiri Olsa,
	Alexander Shishkin, Thomas Gleixner, Borislav Petkov,
	dave.hansen, H. Peter Anvin, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, Kim Phillips, santosh.shukla,
	Stephane Eranian

Hello,

On Thu, Apr 28, 2022 at 10:15 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> IbsOpRip is recorded when IBS interrupt is triggered. But there is
> a skid from the time IBS interrupt gets triggered to the time the
> interrupt is presented to the core. Meanwhile processor would have
> moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
> recorded as part of the interrupt regs. This causes issues while
> unwinding stack using the ORC unwinder as it needs consistent rip,
> rsp and rbp. Fix this by using rip from interrupt regs instead of
> IbsOpRip for stack unwinding.
>
> Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
> Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


> ---
>  arch/x86/events/amd/ibs.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 9739019d4b67..171941043f53 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -304,6 +304,16 @@ static int perf_ibs_init(struct perf_event *event)
>         hwc->config_base = perf_ibs->msr;
>         hwc->config = config;
>
> +       /*
> +        * rip recorded by IbsOpRip will not be consistent with rsp and rbp
> +        * recorded as part of interrupt regs. Thus we need to use rip from
> +        * interrupt regs while unwinding call stack. Setting _EARLY flag
> +        * makes sure we unwind call-stack before perf sample rip is set to
> +        * IbsOpRip.
> +        */
> +       if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
> +               event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
> +
>         return 0;
>  }
>
> @@ -687,6 +697,14 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>                 data.raw = &raw;
>         }
>
> +       /*
> +        * rip recorded by IbsOpRip will not be consistent with rsp and rbp
> +        * recorded as part of interrupt regs. Thus we need to use rip from
> +        * interrupt regs while unwinding call stack.
> +        */
> +       if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
> +               data.callchain = perf_callchain(event, iregs);
> +
>         throttle = perf_event_overflow(event, &data, &regs);
>  out:
>         if (throttle) {
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip: perf/core] perf/amd/ibs: Use interrupt regs ip for stack unwinding
  2022-04-29  5:14   ` [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding Ravi Bangoria
  2022-04-29 20:28     ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-05-02  6:07     ` [PATCH] " Namhyung Kim
@ 2022-05-04  9:23     ` tip-bot2 for Ravi Bangoria
  2022-05-10  9:11     ` tip-bot2 for Ravi Bangoria
  3 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-05-04  9:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Dmitry Monakhov, Peter Zijlstra, Ravi Bangoria, x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     bd24325684029a48f20a188b899eb84900d0bc9c
Gitweb:        https://git.kernel.org/tip/bd24325684029a48f20a188b899eb84900d0bc9c
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Fri, 29 Apr 2022 10:44:41 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 04 May 2022 11:18:27 +02:00

perf/amd/ibs: Use interrupt regs ip for stack unwinding

IbsOpRip is recorded when IBS interrupt is triggered. But there is
a skid from the time IBS interrupt gets triggered to the time the
interrupt is presented to the core. Meanwhile processor would have
moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
recorded as part of the interrupt regs. This causes issues while
unwinding stack using the ORC unwinder as it needs consistent rip,
rsp and rbp. Fix this by using rip from interrupt regs instead of
IbsOpRip for stack unwinding.

Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220429051441.14251-1-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9739019..11e8b49 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -304,6 +304,16 @@ static int perf_ibs_init(struct perf_event *event)
 	hwc->config_base = perf_ibs->msr;
 	hwc->config = config;
 
+	/*
+	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+	 * recorded as part of interrupt regs. Thus we need to use rip from
+	 * interrupt regs while unwinding call stack. Setting _EARLY flag
+	 * makes sure we unwind call-stack before perf sample rip is set to
+	 * IbsOpRip.
+	 */
+	if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+		event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
+
 	return 0;
 }
 
@@ -687,6 +697,14 @@ fail:
 		data.raw = &raw;
 	}
 
+	/*
+	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+	 * recorded as part of interrupt regs. Thus we need to use rip from
+	 * interrupt regs while unwinding call stack.
+	 */
+	if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+		data.callchain = perf_callchain(event, iregs);
+
 	throttle = perf_event_overflow(event, &data, &regs);
 out:
 	if (throttle) {

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: perf/core] perf/amd/ibs: Use interrupt regs ip for stack unwinding
  2022-04-29  5:14   ` [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding Ravi Bangoria
                       ` (2 preceding siblings ...)
  2022-05-04  9:23     ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
@ 2022-05-10  9:11     ` tip-bot2 for Ravi Bangoria
  3 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-05-10  9:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Dmitry Monakhov, Peter Zijlstra, Ravi Bangoria, x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     3d47083b9ff46863e8374ad3bb5edb5e464c75f8
Gitweb:        https://git.kernel.org/tip/3d47083b9ff46863e8374ad3bb5edb5e464c75f8
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Fri, 29 Apr 2022 10:44:41 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 10 May 2022 11:00:45 +02:00

perf/amd/ibs: Use interrupt regs ip for stack unwinding

IbsOpRip is recorded when IBS interrupt is triggered. But there is
a skid from the time IBS interrupt gets triggered to the time the
interrupt is presented to the core. Meanwhile processor would have
moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
recorded as part of the interrupt regs. This causes issues while
unwinding stack using the ORC unwinder as it needs consistent rip,
rsp and rbp. Fix this by using rip from interrupt regs instead of
IbsOpRip for stack unwinding.

Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220429051441.14251-1-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9739019..11e8b49 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -304,6 +304,16 @@ static int perf_ibs_init(struct perf_event *event)
 	hwc->config_base = perf_ibs->msr;
 	hwc->config = config;
 
+	/*
+	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+	 * recorded as part of interrupt regs. Thus we need to use rip from
+	 * interrupt regs while unwinding call stack. Setting _EARLY flag
+	 * makes sure we unwind call-stack before perf sample rip is set to
+	 * IbsOpRip.
+	 */
+	if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+		event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
+
 	return 0;
 }
 
@@ -687,6 +697,14 @@ fail:
 		data.raw = &raw;
 	}
 
+	/*
+	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+	 * recorded as part of interrupt regs. Thus we need to use rip from
+	 * interrupt regs while unwinding call stack.
+	 */
+	if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+		data.callchain = perf_callchain(event, iregs);
+
 	throttle = perf_event_overflow(event, &data, &regs);
 out:
 	if (throttle) {

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-10  9:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12  7:40 [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Dmitry Monakhov
2022-04-12  7:40 ` [PATCH 2/2] x86/unwind/orc: Fix address check size for deref_stack_iret_regs Dmitry Monakhov
2022-04-12 10:01   ` Peter Zijlstra
2022-04-12 10:57     ` Dmitry Monakhov
2022-04-12 10:08 ` [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Peter Zijlstra
2022-04-16  0:49   ` Josh Poimboeuf
2022-04-29  5:14   ` [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding Ravi Bangoria
2022-04-29 20:28     ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-05-02  6:07     ` [PATCH] " Namhyung Kim
2022-05-04  9:23     ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-05-10  9:11     ` tip-bot2 for Ravi Bangoria
2022-04-12 10:11 ` [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated Peter Zijlstra
2022-04-14 15:18   ` Josh Poimboeuf
2022-04-16  0:46 ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).