linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/unwind/orc: fix inactive tasks with sp in sp
@ 2020-10-07  8:19 Jiri Slaby
  2020-10-07 14:54 ` Josh Poimboeuf
  0 siblings, 1 reply; 3+ messages in thread
From: Jiri Slaby @ 2020-10-07  8:19 UTC (permalink / raw)
  To: jpoimboe; +Cc: linux-kernel, Jiri Slaby, Miroslav Benes

gcc-10 optimizes the scheduler code differently than its predecessors,
depending on DEBUG_SECTION_MISMATCH=y config -- the config sets
-fno-inline-functions-called-once. Until now, __schedule contained the
usual prologue (push bp; mov sp,bp). ORC unwinder simply picked sp from
bp and unwound from __schedule just perfectly:
$ cat /proc/1/stack
[<0>] ep_poll+0x3e9/0x450
[<0>] do_epoll_wait+0xaa/0xc0
[<0>] __x64_sys_epoll_wait+0x1a/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

But now, w/o bp prologue:
$ cat /proc/1/stack
<nothing>

The orc entry of the point in __schedule is:
sp:sp+88 bp:last_sp-48 type:call end:0

In this case, nobody subtracts sizeof "struct inactive_task_frame". The
struct is put on the stack by __switch_to_asm. Only then __switch_to_asm
stores sp to task->thread.sp. But we start unwinding from a point in
__schedule (stored in frame->ret_addr by 'call') and not in
__switch_to_asm.

So for these example values in __unwind_start:
sp=ffff94b50001fdc8 bp=ffff8e1f41d29340 ip=__schedule+0x1f0

The stack is:
 ffff94b50001fdc8: ffff8e1f41578000 # struct inactive_task_frame
 ffff94b50001fdd0: 0000000000000000
 ffff94b50001fdd8: ffff8e1f41d29340
 ffff94b50001fde0: ffff8e1f41611d40 # ...
 ffff94b50001fde8: ffffffff93c41920 # bx
 ffff94b50001fdf0: ffff8e1f41d29340 # bp
 ffff94b50001fdf8: ffffffff9376cad0 # end of the struct (ret_addr)

0xffffffff9376cad0 is __schedule+0x1f0 (after the call to
__switch_to_asm).  Now follow those 88 bytes from the ORC entry (sp+88).
The entry is correct, __schedule really pushes 48 bytes (8*7) + 32 bytes
via subq to store some local values (like 4U below). So to unwind, look
at the offset 88-sizeof(long) = 0x50 from here:

 ffff94b50001fe00: ffff8e1f41578618
 ffff94b50001fe08: 00000cc000000255
 ffff94b50001fe10: 0000000500000004
 ffff94b50001fe18: 7793fab6956b2d00 # NOTE (see below)
 ffff94b50001fe20: ffff8e1f41578000
 ffff94b50001fe28: ffff8e1f41578000
 ffff94b50001fe30: ffff8e1f41578000
 ffff94b50001fe38: ffff8e1f41578000
 ffff94b50001fe40: ffff94b50001fed8
 ffff94b50001fe48: ffff8e1f41577ff0
 ffff94b50001fe50: ffffffff9376cf12

Here               ^^^^^^^^^^^^^^^^ is the correct ret addr from
__schedule. It translates to schedule+0x42 (insn after a call to
__schedule).

BUT, unwind_next_frame tries to take the address starting from
0xffff94b50001fdc8. That is exactly from thread.sp+88-sizeof(long) =
0xffff94b50001fdc8+88-8 = 0xffff94b50001fe18, which is garbage marked as
NOTE above. So this quits the unwinding as 7793fab6956b2d00 is obviously
not a kernel address.

There was a fix to skip 'struct inactive_task_frame' in
unwind_get_return_address_ptr in commit 187b96db5ca7 ("x86/unwind/orc:
Fix unwind_get_return_address_ptr() for inactive tasks").

But we need to skip the struct already in the unwinder proper. So
subtract the size (increase the stack pointer) of the structure in
__unwind_start directly. This allows for removal of the code added by
commit 187b96db5ca7 completely, as the address is now at
'(unsigned long *)state->sp - 1', the same as in the generic case.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Bug: https://bugzilla.suse.com/show_bug.cgi?id=1176907
---
 arch/x86/kernel/unwind_orc.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 6a339ce328e0..4492465fd77b 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -321,19 +321,12 @@ EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
 unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
 {
-	struct task_struct *task = state->task;
-
 	if (unwind_done(state))
 		return NULL;
 
 	if (state->regs)
 		return &state->regs->ip;
 
-	if (task != current && state->sp == task->thread.sp) {
-		struct inactive_task_frame *frame = (void *)task->thread.sp;
-		return &frame->ret_addr;
-	}
-
 	if (state->sp)
 		return (unsigned long *)state->sp - 1;
 
@@ -663,7 +656,13 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
 	} else {
 		struct inactive_task_frame *frame = (void *)task->thread.sp;
 
-		state->sp = task->thread.sp;
+		/*
+		 * @ret_addr is in __schedule _before_ the @frame is pushed to
+		 * the stack, but @thread.sp is saved in __switch_to_asm only
+		 * _after_ saving the @frame, so subtract the @frame size, i.e.
+		 * add it to @thread.sp.
+		 */
+		state->sp = task->thread.sp + sizeof(*frame);
 		state->bp = READ_ONCE_NOCHECK(frame->bp);
 		state->ip = READ_ONCE_NOCHECK(frame->ret_addr);
 		state->signal = (void *)state->ip == ret_from_fork;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/unwind/orc: fix inactive tasks with sp in sp
  2020-10-07  8:19 [PATCH] x86/unwind/orc: fix inactive tasks with sp in sp Jiri Slaby
@ 2020-10-07 14:54 ` Josh Poimboeuf
  2020-10-14  5:08   ` Jiri Slaby
  0 siblings, 1 reply; 3+ messages in thread
From: Josh Poimboeuf @ 2020-10-07 14:54 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: linux-kernel, Miroslav Benes

-ENOPARSE on $SUBJECT.

Also please address it to x86@kernel.org, I think the tip maintainers
can pick up the fix directly.

Also it might be a good idea to Cc the live-patching mailing list, I
presume this causes a livepatch stall?

On Wed, Oct 07, 2020 at 10:19:09AM +0200, Jiri Slaby wrote:
> gcc-10 optimizes the scheduler code differently than its predecessors,
> depending on DEBUG_SECTION_MISMATCH=y config -- the config sets
> -fno-inline-functions-called-once.

Weird.  Was GCC ignoring this flag before?

> @@ -663,7 +656,13 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
>  	} else {
>  		struct inactive_task_frame *frame = (void *)task->thread.sp;
>  
> -		state->sp = task->thread.sp;
> +		/*
> +		 * @ret_addr is in __schedule _before_ the @frame is pushed to
> +		 * the stack, but @thread.sp is saved in __switch_to_asm only
> +		 * _after_ saving the @frame, so subtract the @frame size, i.e.
> +		 * add it to @thread.sp.
> +		 */
> +		state->sp = task->thread.sp + sizeof(*frame);

IMO, the code speaks for itself and the comment may be superfluous.

Otherwise it looks good to me.  Thanks for fixing it!

-- 
Josh


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/unwind/orc: fix inactive tasks with sp in sp
  2020-10-07 14:54 ` Josh Poimboeuf
@ 2020-10-14  5:08   ` Jiri Slaby
  0 siblings, 0 replies; 3+ messages in thread
From: Jiri Slaby @ 2020-10-14  5:08 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: linux-kernel, Miroslav Benes

On 07. 10. 20, 16:54, Josh Poimboeuf wrote:
> -ENOPARSE on $SUBJECT.
> 
> Also please address it to x86@kernel.org, I think the tip maintainers
> can pick up the fix directly.

Hmm, weird, I must have sent an older version as my current patch in the 
tree has:
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org

> Also it might be a good idea to Cc the live-patching mailing list, I
> presume this causes a livepatch stall?
> 
> On Wed, Oct 07, 2020 at 10:19:09AM +0200, Jiri Slaby wrote:
>> gcc-10 optimizes the scheduler code differently than its predecessors,
>> depending on DEBUG_SECTION_MISMATCH=y config -- the config sets
>> -fno-inline-functions-called-once.
> 
> Weird.  Was GCC ignoring this flag before?

gcc 7 generated the earlier mentioned prologue (push bp; mov sp,bp). So 
we extract stack pointer from bp. gcc 10 no longer generates the 
prologue in some of the standalone functions. That's the difference. So 
we started extracting stack pointer from sp which contains more than we 
expect.

And the problem also (obviously) dismisses when gcc (even 10) inlines 
the function.

>> @@ -663,7 +656,13 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
>>   	} else {
>>   		struct inactive_task_frame *frame = (void *)task->thread.sp;
>>   
>> -		state->sp = task->thread.sp;
>> +		/*
>> +		 * @ret_addr is in __schedule _before_ the @frame is pushed to
>> +		 * the stack, but @thread.sp is saved in __switch_to_asm only
>> +		 * _after_ saving the @frame, so subtract the @frame size, i.e.
>> +		 * add it to @thread.sp.
>> +		 */
>> +		state->sp = task->thread.sp + sizeof(*frame);
> 
> IMO, the code speaks for itself and the comment may be superfluous.
> 
> Otherwise it looks good to me.  Thanks for fixing it!

OK, will resend the correct and fixed version.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-10-14  5:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-07  8:19 [PATCH] x86/unwind/orc: fix inactive tasks with sp in sp Jiri Slaby
2020-10-07 14:54 ` Josh Poimboeuf
2020-10-14  5:08   ` Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).