[PATCH] x86/orc: Don't bail on stack overflow

* [PATCH] x86/orc: Don't bail on stack overflow
@ 2017-11-25 17:28 Andy Lutomirski
  2017-11-25 18:26 ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2017-11-25 17:28 UTC (permalink / raw)
  To: X86 ML
  Cc: Borislav Petkov, linux-kernel, Brian Gerst, Dave Hansen,
	Linus Torvalds, Josh Poimboeuf, Andy Lutomirski

If we overflow the stack into a guard page and then try to unwind
it with ORC, it should work perfectly: by construction, there can't
be any meaningful data in the guard page because no writes to the
guard page will have succeeded.

ORC seems entirely capable of unwinding in this situation, except
that it doesn't even try.  Adjust its initial stack check so that
it's willing to try unwinding.

I tested this by intentionally overflowing the task stack.  The
result is an accurate call trace instead of a trace consisting
purely of '?' entries.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

Hi all-

Ingo, this would have fixed half the debugging problem you had, I think.
To really nail it, we'd want some kind of magic to annotate the trace
so that page_fault (and async_page_fault) entries show CR2 and error_code.

Josh, any ideas of how to do that cleanly?  We could easily hard-code it
in the OOPS unwinder, I guess.

arch/x86/kernel/unwind_orc.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index a3f973b2c97a..7f6e3935666b 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -553,8 +553,18 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
 	}
 
 	if (get_stack_info((unsigned long *)state->sp, state->task,
-			   &state->stack_info, &state->stack_mask))
-		return;
+			   &state->stack_info, &state->stack_mask)) {
+		/*
+		 * We weren't on a valid stack.  It's possible that
+		 * we overflowed a valid stack into a guard page.
+		 * See if the next page up is valid so that we can
+		 * generate some kind of backtrace if this happens.
+		 */
+		void *next_page = (void *)PAGE_ALIGN((unsigned long)regs->sp);
+		if (get_stack_info(next_page, state->task, &state->stack_info,
+				   &state->stack_mask))
+			return;
+	}
 
 	/*
 	 * The caller can provide the address of the first frame directly
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread