Live-Patching Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/4] s390/livepatch: Implement reliable stack tracing for the consistency model
@ 2019-11-06  9:55 Miroslav Benes
  2019-11-06  9:55 ` [PATCH v3 1/4] s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr() Miroslav Benes
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Miroslav Benes @ 2019-11-06  9:55 UTC (permalink / raw)
  To: heiko.carstens, gor, borntraeger, jpoimboe, joe.lawrence
  Cc: linux-s390, linux-kernel, jikos, pmladek, nstange, live-patching,
	Miroslav Benes

The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.

The former should be guaranteed (see 4/4 for details), the latter is
implemented by the patch set (mainly 4/4).

It passes livepatch kselftests (with timeout disabled, see
https://lore.kernel.org/live-patching/20191025115041.23186-1-mbenes@suse.cz/)
and tests from https://github.com/lpechacek/qa_test_klp.

Based on v5.4-rc6. That is why I left 1/4 patch in the series despite
its presence in s390/features branch. The context of the rest of the
series could be preserved as such and changes in s390/fixes present in
v5.4-rc6 are taken into account.

Changes since v2:
- unwind_next_frame() is split to several functions to make it easier to
  review and prepare it for additional changes
- unwind_next_frame_reliable() merged into the existing
  unwind_next_frame()

Changes since v1 and questions follow:
- rebased on top of v5.4-rc5. It also meant rebasing on top of
  ARCH_STACKWALK, which made the outcome nicer and addressed some of
  Joe's remarks.
- sp alignment is checked in both _start and _next_frame cases
- ftrace_graph_ret_addr() calling cleanup

- I tried to use the existing infrastructure as much as possible with
  one exception. I kept unwind_next_frame_reliable() next to the
  ordinary unwind_next_frame(). I did not come up with a nice solution
  how to integrate it. The reliable unwinding is executed on a task
  stack only, which leads to a nice simplification. My integration
  attempts only obfuscated the existing unwind_next_frame() which is
  already not easy to read. Ideas are definitely welcome.

- although not nice (Josh mentioned it in his review), I kept checking
  for kretprobe_trampoline in the main loop. It could go into _start and
  _next_frame callbacks, but more changes would be required (changing a
  function return type, ordinary unwinding does not need it). So I did
  not think it was worth it. I could change it in v3, of course.

Miroslav Benes (4):
  s390/unwind: drop unnecessary code around calling
    ftrace_graph_ret_addr()
  s390/unwind: split unwind_next_frame() to several functions
  s390/unwind: prepare the unwinding interface for reliable stack traces
  s390/livepatch: Implement reliable stack tracing for the consistency
    model

 arch/s390/Kconfig                  |   1 +
 arch/s390/include/asm/stacktrace.h |   3 +-
 arch/s390/include/asm/unwind.h     |  18 +--
 arch/s390/kernel/dumpstack.c       |  16 ++-
 arch/s390/kernel/perf_event.c      |   2 +-
 arch/s390/kernel/stacktrace.c      |  48 ++++++-
 arch/s390/kernel/unwind_bc.c       | 194 ++++++++++++++++++++---------
 arch/s390/oprofile/init.c          |   2 +-
 8 files changed, 208 insertions(+), 76 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/4] s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr()
  2019-11-06  9:55 [PATCH v3 0/4] s390/livepatch: Implement reliable stack tracing for the consistency model Miroslav Benes
@ 2019-11-06  9:55 ` Miroslav Benes
  2019-11-28 16:51   ` Vasily Gorbik
  2019-11-06  9:55 ` [PATCH v3 2/4] s390/unwind: split unwind_next_frame() to several functions Miroslav Benes
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Miroslav Benes @ 2019-11-06  9:55 UTC (permalink / raw)
  To: heiko.carstens, gor, borntraeger, jpoimboe, joe.lawrence
  Cc: linux-s390, linux-kernel, jikos, pmladek, nstange, live-patching,
	Miroslav Benes

The current code around calling ftrace_graph_ret_addr() is ifdeffed and
also tests if ftrace redirection is present on stack.
ftrace_graph_ret_addr() however performs the test internally and there
is a version for !CONFIG_FUNCTION_GRAPH_TRACER as well. The unnecessary
code can thus be dropped.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
---
 arch/s390/kernel/unwind_bc.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index a8204f952315..5a78deacb972 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -85,12 +85,8 @@ bool unwind_next_frame(struct unwind_state *state)
 		}
 	}
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-	/* Decode any ftrace redirection */
-	if (ip == (unsigned long) return_to_handler)
-		ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
-					   ip, (void *) sp);
-#endif
+	ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
+				   ip, (void *) sp);
 
 	/* Update unwind state */
 	state->sp = sp;
@@ -147,12 +143,8 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
 		reuse_sp = false;
 	}
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-	/* Decode any ftrace redirection */
-	if (ip == (unsigned long) return_to_handler)
-		ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
-					   ip, NULL);
-#endif
+	ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
+				   ip, NULL);
 
 	/* Update unwind state */
 	state->sp = sp;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 2/4] s390/unwind: split unwind_next_frame() to several functions
  2019-11-06  9:55 [PATCH v3 0/4] s390/livepatch: Implement reliable stack tracing for the consistency model Miroslav Benes
  2019-11-06  9:55 ` [PATCH v3 1/4] s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr() Miroslav Benes
@ 2019-11-06  9:55 ` Miroslav Benes
  2019-11-06  9:56 ` [PATCH v3 3/4] s390/unwind: prepare the unwinding interface for reliable stack traces Miroslav Benes
  2019-11-06  9:56 ` [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model Miroslav Benes
  3 siblings, 0 replies; 12+ messages in thread
From: Miroslav Benes @ 2019-11-06  9:55 UTC (permalink / raw)
  To: heiko.carstens, gor, borntraeger, jpoimboe, joe.lawrence
  Cc: linux-s390, linux-kernel, jikos, pmladek, nstange, live-patching,
	Miroslav Benes

Function unwind_next_frame() becomes less readable with each new
change. Split it to several functions to amend it and prepare for new
additions.

No functional change.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
---
 arch/s390/kernel/unwind_bc.c | 136 ++++++++++++++++++++++-------------
 1 file changed, 88 insertions(+), 48 deletions(-)

diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index 5a78deacb972..96da99ec7b59 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -36,55 +36,10 @@ static bool update_stack_info(struct unwind_state *state, unsigned long sp)
 	return true;
 }
 
-bool unwind_next_frame(struct unwind_state *state)
+static bool unwind_update_state(struct unwind_state *state,
+				unsigned long sp, unsigned long ip,
+				struct pt_regs *regs, bool reliable)
 {
-	struct stack_info *info = &state->stack_info;
-	struct stack_frame *sf;
-	struct pt_regs *regs;
-	unsigned long sp, ip;
-	bool reliable;
-
-	regs = state->regs;
-	if (unlikely(regs)) {
-		if (state->reuse_sp) {
-			sp = state->sp;
-			state->reuse_sp = false;
-		} else {
-			sp = READ_ONCE_NOCHECK(regs->gprs[15]);
-			if (unlikely(outside_of_stack(state, sp))) {
-				if (!update_stack_info(state, sp))
-					goto out_err;
-			}
-		}
-		sf = (struct stack_frame *) sp;
-		ip = READ_ONCE_NOCHECK(sf->gprs[8]);
-		reliable = false;
-		regs = NULL;
-	} else {
-		sf = (struct stack_frame *) state->sp;
-		sp = READ_ONCE_NOCHECK(sf->back_chain);
-		if (likely(sp)) {
-			/* Non-zero back-chain points to the previous frame */
-			if (unlikely(outside_of_stack(state, sp))) {
-				if (!update_stack_info(state, sp))
-					goto out_err;
-			}
-			sf = (struct stack_frame *) sp;
-			ip = READ_ONCE_NOCHECK(sf->gprs[8]);
-			reliable = true;
-		} else {
-			/* No back-chain, look for a pt_regs structure */
-			sp = state->sp + STACK_FRAME_OVERHEAD;
-			if (!on_stack(info, sp, sizeof(struct pt_regs)))
-				goto out_stop;
-			regs = (struct pt_regs *) sp;
-			if (READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE)
-				goto out_stop;
-			ip = READ_ONCE_NOCHECK(regs->psw.addr);
-			reliable = true;
-		}
-	}
-
 	ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
 				   ip, (void *) sp);
 
@@ -94,13 +49,98 @@ bool unwind_next_frame(struct unwind_state *state)
 	state->regs = regs;
 	state->reliable = reliable;
 	return true;
+}
+
+static bool unwind_use_regs(struct unwind_state *state)
+{
+	struct stack_frame *sf;
+	unsigned long sp, ip;
+	struct pt_regs *regs = state->regs;
+
+	if (state->reuse_sp) {
+		sp = state->sp;
+		state->reuse_sp = false;
+	} else {
+		sp = READ_ONCE_NOCHECK(regs->gprs[15]);
+		if (unlikely(outside_of_stack(state, sp))) {
+			if (!update_stack_info(state, sp))
+				goto out_err;
+		}
+	}
+
+	sf = (struct stack_frame *) sp;
+	ip = READ_ONCE_NOCHECK(sf->gprs[8]);
+
+	return unwind_update_state(state, sp, ip, NULL, false);
+
+out_err:
+	state->error = true;
+	state->stack_info.type = STACK_TYPE_UNKNOWN;
+	return false;
+}
+
+static bool unwind_use_frame(struct unwind_state *state, unsigned long sp)
+{
+	struct stack_frame *sf;
+	unsigned long ip;
+
+	if (unlikely(outside_of_stack(state, sp))) {
+		if (!update_stack_info(state, sp))
+			goto out_err;
+	}
+
+	sf = (struct stack_frame *) sp;
+	ip = READ_ONCE_NOCHECK(sf->gprs[8]);
+
+	return unwind_update_state(state, sp, ip, NULL, true);
 
 out_err:
 	state->error = true;
+	state->stack_info.type = STACK_TYPE_UNKNOWN;
+	return false;
+}
+
+static bool unwind_look_for_regs(struct unwind_state *state)
+{
+	struct stack_info *info = &state->stack_info;
+	struct pt_regs *regs;
+	unsigned long sp, ip;
+
+	sp = state->sp + STACK_FRAME_OVERHEAD;
+	if (!on_stack(info, sp, sizeof(struct pt_regs)))
+		goto out_stop;
+
+	regs = (struct pt_regs *) sp;
+	if (READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE)
+		goto out_stop;
+
+	ip = READ_ONCE_NOCHECK(regs->psw.addr);
+
+	return unwind_update_state(state, sp, ip, regs, true);
+
 out_stop:
 	state->stack_info.type = STACK_TYPE_UNKNOWN;
 	return false;
 }
+
+bool unwind_next_frame(struct unwind_state *state)
+{
+	struct stack_frame *sf;
+	unsigned long sp;
+
+	if (unlikely(state->regs))
+		return unwind_use_regs(state);
+
+	sf = (struct stack_frame *) state->sp;
+	sp = READ_ONCE_NOCHECK(sf->back_chain);
+
+	/* Non-zero back-chain points to the previous frame */
+	if (likely(sp))
+		return unwind_use_frame(state, sp);
+
+	/* No back-chain, look for a pt_regs structure */
+	return unwind_look_for_regs(state);
+}
 EXPORT_SYMBOL_GPL(unwind_next_frame);
 
 void __unwind_start(struct unwind_state *state, struct task_struct *task,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 3/4] s390/unwind: prepare the unwinding interface for reliable stack traces
  2019-11-06  9:55 [PATCH v3 0/4] s390/livepatch: Implement reliable stack tracing for the consistency model Miroslav Benes
  2019-11-06  9:55 ` [PATCH v3 1/4] s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr() Miroslav Benes
  2019-11-06  9:55 ` [PATCH v3 2/4] s390/unwind: split unwind_next_frame() to several functions Miroslav Benes
@ 2019-11-06  9:56 ` Miroslav Benes
  2019-11-06  9:56 ` [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model Miroslav Benes
  3 siblings, 0 replies; 12+ messages in thread
From: Miroslav Benes @ 2019-11-06  9:56 UTC (permalink / raw)
  To: heiko.carstens, gor, borntraeger, jpoimboe, joe.lawrence
  Cc: linux-s390, linux-kernel, jikos, pmladek, nstange, live-patching,
	Miroslav Benes

The reliable stack traces support will require to perform more actions
and some tasks differently. Add a new parameter to respective functions
and propagate it where appropriate.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
---
 arch/s390/include/asm/stacktrace.h |  3 ++-
 arch/s390/include/asm/unwind.h     | 18 ++++++++++--------
 arch/s390/kernel/dumpstack.c       |  5 +++--
 arch/s390/kernel/perf_event.c      |  2 +-
 arch/s390/kernel/stacktrace.c      |  2 +-
 arch/s390/kernel/unwind_bc.c       | 19 +++++++++++--------
 arch/s390/oprofile/init.c          |  2 +-
 7 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/arch/s390/include/asm/stacktrace.h b/arch/s390/include/asm/stacktrace.h
index 0ae4bbf7779c..f033bb70a8db 100644
--- a/arch/s390/include/asm/stacktrace.h
+++ b/arch/s390/include/asm/stacktrace.h
@@ -21,7 +21,8 @@ struct stack_info {
 
 const char *stack_type_name(enum stack_type type);
 int get_stack_info(unsigned long sp, struct task_struct *task,
-		   struct stack_info *info, unsigned long *visit_mask);
+		   struct stack_info *info, unsigned long *visit_mask,
+		   bool unwind_reliable);
 
 static inline bool on_stack(struct stack_info *info,
 			    unsigned long addr, size_t len)
diff --git a/arch/s390/include/asm/unwind.h b/arch/s390/include/asm/unwind.h
index eaaefeceef6f..4c36e200404e 100644
--- a/arch/s390/include/asm/unwind.h
+++ b/arch/s390/include/asm/unwind.h
@@ -42,8 +42,9 @@ struct unwind_state {
 };
 
 void __unwind_start(struct unwind_state *state, struct task_struct *task,
-		    struct pt_regs *regs, unsigned long first_frame);
-bool unwind_next_frame(struct unwind_state *state);
+		    struct pt_regs *regs, unsigned long first_frame,
+		    bool unwind_reliable);
+bool unwind_next_frame(struct unwind_state *state, bool unwind_reliable);
 unsigned long unwind_get_return_address(struct unwind_state *state);
 
 static inline bool unwind_done(struct unwind_state *state)
@@ -59,10 +60,11 @@ static inline bool unwind_error(struct unwind_state *state)
 static inline void unwind_start(struct unwind_state *state,
 				struct task_struct *task,
 				struct pt_regs *regs,
-				unsigned long sp)
+				unsigned long sp,
+				bool unwind_reliable)
 {
 	sp = sp ? : get_stack_pointer(task, regs);
-	__unwind_start(state, task, regs, sp);
+	__unwind_start(state, task, regs, sp, unwind_reliable);
 }
 
 static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
@@ -70,10 +72,10 @@ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
 	return unwind_done(state) ? NULL : state->regs;
 }
 
-#define unwind_for_each_frame(state, task, regs, first_frame)	\
-	for (unwind_start(state, task, regs, first_frame);	\
-	     !unwind_done(state);				\
-	     unwind_next_frame(state))
+#define unwind_for_each_frame(state, task, regs, first_frame, unwind_reliable)	\
+	for (unwind_start(state, task, regs, first_frame, unwind_reliable);	\
+	     !unwind_done(state);						\
+	     unwind_next_frame(state, unwind_reliable))
 
 static inline void unwind_init(void) {}
 static inline void unwind_module_init(struct module *mod, void *orc_ip,
diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index 34bdc60c0b11..1ee19e6336cd 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -88,7 +88,8 @@ static bool in_restart_stack(unsigned long sp, struct stack_info *info)
 }
 
 int get_stack_info(unsigned long sp, struct task_struct *task,
-		   struct stack_info *info, unsigned long *visit_mask)
+		   struct stack_info *info, unsigned long *visit_mask,
+		   bool unwind_reliable)
 {
 	if (!sp)
 		goto unknown;
@@ -130,7 +131,7 @@ void show_stack(struct task_struct *task, unsigned long *stack)
 	printk("Call Trace:\n");
 	if (!task)
 		task = current;
-	unwind_for_each_frame(&state, task, NULL, (unsigned long) stack)
+	unwind_for_each_frame(&state, task, NULL, (unsigned long) stack, false)
 		printk(state.reliable ? " [<%016lx>] %pSR \n" :
 					"([<%016lx>] %pSR)\n",
 		       state.ip, (void *) state.ip);
diff --git a/arch/s390/kernel/perf_event.c b/arch/s390/kernel/perf_event.c
index fcb6c2e92b07..7e81db0883e1 100644
--- a/arch/s390/kernel/perf_event.c
+++ b/arch/s390/kernel/perf_event.c
@@ -225,7 +225,7 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 {
 	struct unwind_state state;
 
-	unwind_for_each_frame(&state, current, regs, 0)
+	unwind_for_each_frame(&state, current, regs, 0, false)
 		perf_callchain_store(entry, state.ip);
 }
 
diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index f8fc4f8aef9b..751c136172f7 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -16,7 +16,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 	struct unwind_state state;
 	unsigned long addr;
 
-	unwind_for_each_frame(&state, task, regs, 0) {
+	unwind_for_each_frame(&state, task, regs, 0, false) {
 		addr = unwind_get_return_address(&state);
 		if (!addr || !consume_entry(cookie, addr, false))
 			break;
diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index 96da99ec7b59..092626f4e7c6 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -29,7 +29,7 @@ static bool update_stack_info(struct unwind_state *state, unsigned long sp)
 	unsigned long *mask = &state->stack_mask;
 
 	/* New stack pointer leaves the current stack */
-	if (get_stack_info(sp, state->task, info, mask) != 0 ||
+	if (get_stack_info(sp, state->task, info, mask, false) != 0 ||
 	    !on_stack(info, sp, sizeof(struct stack_frame)))
 		/* 'sp' does not point to a valid stack */
 		return false;
@@ -79,7 +79,8 @@ static bool unwind_use_regs(struct unwind_state *state)
 	return false;
 }
 
-static bool unwind_use_frame(struct unwind_state *state, unsigned long sp)
+static bool unwind_use_frame(struct unwind_state *state, unsigned long sp,
+			     bool unwind_reliable)
 {
 	struct stack_frame *sf;
 	unsigned long ip;
@@ -100,7 +101,8 @@ static bool unwind_use_frame(struct unwind_state *state, unsigned long sp)
 	return false;
 }
 
-static bool unwind_look_for_regs(struct unwind_state *state)
+static bool unwind_look_for_regs(struct unwind_state *state,
+				 bool unwind_reliable)
 {
 	struct stack_info *info = &state->stack_info;
 	struct pt_regs *regs;
@@ -123,7 +125,7 @@ static bool unwind_look_for_regs(struct unwind_state *state)
 	return false;
 }
 
-bool unwind_next_frame(struct unwind_state *state)
+bool unwind_next_frame(struct unwind_state *state, bool unwind_reliable)
 {
 	struct stack_frame *sf;
 	unsigned long sp;
@@ -136,15 +138,16 @@ bool unwind_next_frame(struct unwind_state *state)
 
 	/* Non-zero back-chain points to the previous frame */
 	if (likely(sp))
-		return unwind_use_frame(state, sp);
+		return unwind_use_frame(state, sp, unwind_reliable);
 
 	/* No back-chain, look for a pt_regs structure */
-	return unwind_look_for_regs(state);
+	return unwind_look_for_regs(state, unwind_reliable);
 }
 EXPORT_SYMBOL_GPL(unwind_next_frame);
 
 void __unwind_start(struct unwind_state *state, struct task_struct *task,
-		    struct pt_regs *regs, unsigned long sp)
+		    struct pt_regs *regs, unsigned long sp,
+		    bool unwind_reliable)
 {
 	struct stack_info *info = &state->stack_info;
 	unsigned long *mask = &state->stack_mask;
@@ -163,7 +166,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
 	}
 
 	/* Get current stack pointer and initialize stack info */
-	if (get_stack_info(sp, task, info, mask) != 0 ||
+	if (get_stack_info(sp, task, info, mask, unwind_reliable) != 0 ||
 	    !on_stack(info, sp, sizeof(struct stack_frame))) {
 		/* Something is wrong with the stack pointer */
 		info->type = STACK_TYPE_UNKNOWN;
diff --git a/arch/s390/oprofile/init.c b/arch/s390/oprofile/init.c
index 7441857df51b..59d736f46cbc 100644
--- a/arch/s390/oprofile/init.c
+++ b/arch/s390/oprofile/init.c
@@ -19,7 +19,7 @@ static void s390_backtrace(struct pt_regs *regs, unsigned int depth)
 {
 	struct unwind_state state;
 
-	unwind_for_each_frame(&state, current, regs, 0) {
+	unwind_for_each_frame(&state, current, regs, 0, false) {
 		if (depth-- == 0)
 			break;
 		oprofile_add_trace(state.ip);
-- 
2.23.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model
  2019-11-06  9:55 [PATCH v3 0/4] s390/livepatch: Implement reliable stack tracing for the consistency model Miroslav Benes
                   ` (2 preceding siblings ...)
  2019-11-06  9:56 ` [PATCH v3 3/4] s390/unwind: prepare the unwinding interface for reliable stack traces Miroslav Benes
@ 2019-11-06  9:56 ` Miroslav Benes
  2019-11-29  7:41   ` Vasily Gorbik
  3 siblings, 1 reply; 12+ messages in thread
From: Miroslav Benes @ 2019-11-06  9:56 UTC (permalink / raw)
  To: heiko.carstens, gor, borntraeger, jpoimboe, joe.lawrence
  Cc: linux-s390, linux-kernel, jikos, pmladek, nstange, live-patching,
	Miroslav Benes

The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.

The "zSeries ELF Application Binary Interface Supplement" says:

  "The stack pointer points to the first word of the lowest allocated
  stack frame. If the "back chain" is implemented this word will point to
  the previously allocated stack frame (towards higher addresses), except
  for the first stack frame, which shall have a back chain of zero (NULL).
  The stack shall grow downwards, in other words towards lower addresses."

"back chain" is optional. GCC option -mbackchain enables it. Quoting
Martin Schwidefsky [1]:

  "The compiler is called with the -mbackchain option, all normal C
  function will store the backchain in the function prologue. All
  functions written in assembler code should do the same, if you find one
  that does not we should fix that. The end result is that a task that
  *voluntarily* called schedule() should have a proper backchain at all
  times.

  Dependent on the use case this may or may not be enough. Asynchronous
  interrupts may stop the CPU at the beginning of a function, if kernel
  preemption is enabled we can end up with a broken backchain.  The
  production kernels for IBM Z are all compiled *without* kernel
  preemption. So yes, we might get away without the objtool support.

  On a side-note, we do have a line item to implement the ORC unwinder for
  the kernel, that includes the objtool support. Once we have that we can
  drop the -mbackchain option for the kernel build. That gives us a nice
  little performance benefit. I hope that the change from backchain to the
  ORC unwinder will not be too hard to implement in the livepatch tools."

Since -mbackchain is enabled by default when the kernel is compiled, the
call chain backtracing should be currently ensured and objtool should
not be necessary for livepatch purposes.

Regarding the second issue, stack corruptions and non-reliable states
have to be recognized by the unwinder. Mainly it means to detect
preemption or page faults, the end of the task stack must be reached,
return addresses must be valid text addresses and hacks like function
graph tracing and kretprobes must be properly detected.

Unwinding a running task's stack is not a problem, because there is a
livepatch requirement that every checked task is blocked, except for the
current task. Due to that, the implementation can be much simpler
compared to the existing non-reliable infrastructure. We can consider a
task's kernel/thread stack only and skip the other stacks.

Idle tasks are a bit special. Their final back chains point to no_dat
stacks. See for reference CALL_ON_STACK() in smp_start_secondary()
callback used in __cpu_up(). The unwinding is stopped there and it is
not considered to be a stack corruption.

[1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org]

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
---
 arch/s390/Kconfig             |  1 +
 arch/s390/kernel/dumpstack.c  | 11 +++++++
 arch/s390/kernel/stacktrace.c | 46 ++++++++++++++++++++++++++
 arch/s390/kernel/unwind_bc.c  | 61 +++++++++++++++++++++++++++--------
 4 files changed, 106 insertions(+), 13 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 43a81d0ad507..9cfec1000abd 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -170,6 +170,7 @@ config S390
 	select HAVE_PERF_EVENTS
 	select HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
+	select HAVE_RELIABLE_STACKTRACE
 	select HAVE_RSEQ
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_VIRT_CPU_ACCOUNTING
diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index 1ee19e6336cd..0081e75b957c 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -94,12 +94,23 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
 	if (!sp)
 		goto unknown;
 
+	/* Sanity check: ABI requires SP to be aligned 8 bytes. */
+	if (sp & 0x7)
+		goto unknown;
+
 	task = task ? : current;
 
 	/* Check per-task stack */
 	if (in_task_stack(sp, task, info))
 		goto recursion_check;
 
+	/*
+	 * The reliable unwinding should not start on nodat_stack, async_stack
+	 * or restart_stack. The task is either current or must be inactive.
+	 */
+	if (unwind_reliable)
+		goto unknown;
+
 	if (task != current)
 		goto unknown;
 
diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index 751c136172f7..c5e3a37763f7 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -9,6 +9,7 @@
 #include <linux/stacktrace.h>
 #include <asm/stacktrace.h>
 #include <asm/unwind.h>
+#include <asm/kprobes.h>
 
 void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 		     struct task_struct *task, struct pt_regs *regs)
@@ -22,3 +23,48 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 			break;
 	}
 }
+
+/*
+ * This function returns an error if it detects any unreliable features of the
+ * stack.  Otherwise it guarantees that the stack trace is reliable.
+ *
+ * If the task is not 'current', the caller *must* ensure the task is inactive.
+ */
+int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
+			     void *cookie, struct task_struct *task)
+{
+	struct unwind_state state;
+	unsigned long addr;
+
+	for (unwind_start(&state, task, NULL, 0, true);
+	     !unwind_done(&state) && !unwind_error(&state);
+	     unwind_next_frame(&state, true)) {
+
+		addr = unwind_get_return_address(&state);
+		if (!addr)
+			return -EINVAL;
+
+#ifdef CONFIG_KPROBES
+		/*
+		 * Mark stacktraces with kretprobed functions on them
+		 * as unreliable.
+		 */
+		if (state.ip == (unsigned long)kretprobe_trampoline)
+			return -EINVAL;
+#endif
+
+		if (!consume_entry(cookie, addr, false))
+			return -EINVAL;
+	}
+
+	/* Check for stack corruption */
+	if (unwind_error(&state))
+		return -EINVAL;
+
+	/* Store kernel_thread_starter, null for swapper/0 */
+	if ((task->flags & (PF_KTHREAD | PF_IDLE)) &&
+	    !consume_entry(cookie, state.regs->psw.addr, false))
+		return -EINVAL;
+
+	return 0;
+}
diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index 092626f4e7c6..9b6bc7539a9e 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -40,6 +40,10 @@ static bool unwind_update_state(struct unwind_state *state,
 				unsigned long sp, unsigned long ip,
 				struct pt_regs *regs, bool reliable)
 {
+	/* Sanity check: ABI requires SP to be aligned 8 bytes. */
+	if (sp & 0x7)
+		goto out_err;
+
 	ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
 				   ip, (void *) sp);
 
@@ -49,6 +53,11 @@ static bool unwind_update_state(struct unwind_state *state,
 	state->regs = regs;
 	state->reliable = reliable;
 	return true;
+
+out_err:
+	state->error = true;
+	state->stack_info.type = STACK_TYPE_UNKNOWN;
+	return false;
 }
 
 static bool unwind_use_regs(struct unwind_state *state)
@@ -85,10 +94,13 @@ static bool unwind_use_frame(struct unwind_state *state, unsigned long sp,
 	struct stack_frame *sf;
 	unsigned long ip;
 
-	if (unlikely(outside_of_stack(state, sp))) {
-		if (!update_stack_info(state, sp))
-			goto out_err;
-	}
+	/*
+	 * No need to update stack info when unwind_reliable is true. We should
+	 * be on a task stack and everything else is an error.
+	 */
+	if (unlikely(outside_of_stack(state, sp)) &&
+	    (unwind_reliable || !update_stack_info(state, sp)))
+		goto out_err;
 
 	sf = (struct stack_frame *) sp;
 	ip = READ_ONCE_NOCHECK(sf->gprs[8]);
@@ -109,17 +121,31 @@ static bool unwind_look_for_regs(struct unwind_state *state,
 	unsigned long sp, ip;
 
 	sp = state->sp + STACK_FRAME_OVERHEAD;
-	if (!on_stack(info, sp, sizeof(struct pt_regs)))
-		goto out_stop;
-
 	regs = (struct pt_regs *) sp;
-	if (READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE)
-		goto out_stop;
 
-	ip = READ_ONCE_NOCHECK(regs->psw.addr);
+	if (!unwind_reliable) {
+		if (!on_stack(info, sp, sizeof(struct pt_regs)))
+			goto out_stop;
 
-	return unwind_update_state(state, sp, ip, regs, true);
+		if (READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE)
+			goto out_stop;
 
+		ip = READ_ONCE_NOCHECK(regs->psw.addr);
+		return unwind_update_state(state, sp, ip, regs, true);
+	}
+
+	/* Unwind reliable */
+	if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
+		goto out_err;
+
+	if (!(state->task->flags & (PF_KTHREAD | PF_IDLE)) && !user_mode(regs))
+		goto out_err;
+
+	state->regs = regs;
+	goto out_stop;
+
+out_err:
+	state->error = true;
 out_stop:
 	state->stack_info.type = STACK_TYPE_UNKNOWN;
 	return false;
@@ -136,8 +162,17 @@ bool unwind_next_frame(struct unwind_state *state, bool unwind_reliable)
 	sf = (struct stack_frame *) state->sp;
 	sp = READ_ONCE_NOCHECK(sf->back_chain);
 
-	/* Non-zero back-chain points to the previous frame */
-	if (likely(sp))
+	/*
+	 * Non-zero back-chain points to the previous frame
+	 *
+	 * unwind_reliable case: Idle tasks are special. The final
+	 * back-chain points to nodat_stack.  See CALL_ON_STACK() in
+	 * smp_start_secondary() callback used in __cpu_up(). We just
+	 * accept it and look for pt_regs.
+	 */
+	if (likely(sp) &&
+	    (!unwind_reliable || !(is_idle_task(state->task) &&
+				   outside_of_stack(state, sp))))
 		return unwind_use_frame(state, sp, unwind_reliable);
 
 	/* No back-chain, look for a pt_regs structure */
-- 
2.23.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/4] s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr()
  2019-11-06  9:55 ` [PATCH v3 1/4] s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr() Miroslav Benes
@ 2019-11-28 16:51   ` Vasily Gorbik
  0 siblings, 0 replies; 12+ messages in thread
From: Vasily Gorbik @ 2019-11-28 16:51 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: heiko.carstens, borntraeger, jpoimboe, joe.lawrence, linux-s390,
	linux-kernel, jikos, pmladek, nstange, live-patching

On Wed, Nov 06, 2019 at 10:55:58AM +0100, Miroslav Benes wrote:
> The current code around calling ftrace_graph_ret_addr() is ifdeffed and
> also tests if ftrace redirection is present on stack.
> ftrace_graph_ret_addr() however performs the test internally and there
> is a version for !CONFIG_FUNCTION_GRAPH_TRACER as well. The unnecessary
> code can thus be dropped.
> 
> Signed-off-by: Miroslav Benes <mbenes@suse.cz>
> ---
>  arch/s390/kernel/unwind_bc.c | 16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)

This patch has been picked up from v2 already. It's in Linus tree.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model
  2019-11-06  9:56 ` [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model Miroslav Benes
@ 2019-11-29  7:41   ` Vasily Gorbik
  2019-11-29  7:41     ` [PATCH v4 1/2] s390/unwind: add stack pointer alignment sanity checks Vasily Gorbik
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Vasily Gorbik @ 2019-11-29  7:41 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: heiko.carstens, borntraeger, jpoimboe, joe.lawrence, linux-s390,
	linux-kernel, jikos, pmladek, nstange, live-patching

On Wed, Nov 06, 2019 at 10:56:01AM +0100, Miroslav Benes wrote:
> The livepatch consistency model requires reliable stack tracing
> architecture support in order to work properly. In order to achieve
> this, two main issues have to be solved. First, reliable and consistent
> call chain backtracing has to be ensured. Second, the unwinder needs to
> be able to detect stack corruptions and return errors.

I tried to address that in a patch series I pushed here:
https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/log/?h=livepatch

It partially includes your changes from this patch which have been split
in 2 patches:
s390/unwind: add stack pointer alignment sanity checks
s390/livepatch: Implement reliable stack tracing for the consistency model

The primary goal was to make our backchain unwinder reliable itself. And
suitable for livepatching as is (we extra checks at the caller, see
below). Besides unwinder changes few things have been improved to avoid
special handling during unwinding.
- all tasks now have pt_regs at the bottom of task stack.
- final backchain always contains 0, no further references to no_dat stack.
- successful unwinding means reaching pt_regs at the bottom of task stack,
and unwinder guarantees that unwind_error() reflects that.
- final pt_regs at the bottom of task stack is never included in unwinding
results. It never was for user tasks. And kernel tasks cannot return to
that state anyway (and in some cases pt_regs are empty).
- unwinder starts unwinding from a reliable state (not "sp" passed as
an argument).
There is also s390 specific unwinder testing module.

> Idle tasks are a bit special. Their final back chains point to no_dat
> stacks. See for reference CALL_ON_STACK() in smp_start_secondary()
> callback used in __cpu_up(). The unwinding is stopped there and it is
> not considered to be a stack corruption.
I changed that with commit:
s390: avoid misusing CALL_ON_STACK for task stack setup
Now idle tasks are not special, final back chain contains 0.

> ---
>  arch/s390/Kconfig             |  1 +
>  arch/s390/kernel/dumpstack.c  | 11 +++++++
>  arch/s390/kernel/stacktrace.c | 46 ++++++++++++++++++++++++++
>  arch/s390/kernel/unwind_bc.c  | 61 +++++++++++++++++++++++++++--------
>  4 files changed, 106 insertions(+), 13 deletions(-)
> 
> --- a/arch/s390/kernel/dumpstack.c
> +++ b/arch/s390/kernel/dumpstack.c
> @@ -94,12 +94,23 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
>  	if (!sp)
>  		goto unknown;
>  
> +	/* Sanity check: ABI requires SP to be aligned 8 bytes. */
> +	if (sp & 0x7)
> +		goto unknown;
> +
This has been split in a separate commit:
s390/unwind: add stack pointer alignment sanity checks

> +	/*
> +	 * The reliable unwinding should not start on nodat_stack, async_stack
> +	 * or restart_stack. The task is either current or must be inactive.
> +	 */
> +	if (unwind_reliable)
> +		goto unknown;
I moved this check in arch_stack_walk_reliable itself. See below.

>  static bool unwind_use_regs(struct unwind_state *state)
> @@ -85,10 +94,13 @@ static bool unwind_use_frame(struct unwind_state *state, unsigned long sp,
>  	struct stack_frame *sf;
>  	unsigned long ip;
>  
> -	if (unlikely(outside_of_stack(state, sp))) {
> -		if (!update_stack_info(state, sp))
> -			goto out_err;
> -	}
> +	/*
> +	 * No need to update stack info when unwind_reliable is true. We should
> +	 * be on a task stack and everything else is an error.
> +	 */
> +	if (unlikely(outside_of_stack(state, sp)) &&
> +	    (unwind_reliable || !update_stack_info(state, sp)))
> +		goto out_err;
I moved this check in arch_stack_walk_reliable itself. See below.

> +	/* Unwind reliable */
> +	if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> +		goto out_err;
I moved this check in arch_stack_walk_reliable itself. See below.


> @@ -136,8 +162,17 @@ bool unwind_next_frame(struct unwind_state *state, bool unwind_reliable)
>  	sf = (struct stack_frame *) state->sp;
>  	sp = READ_ONCE_NOCHECK(sf->back_chain);
>  
> -	/* Non-zero back-chain points to the previous frame */
> -	if (likely(sp))
> +	/*
> +	 * Non-zero back-chain points to the previous frame
> +	 *
> +	 * unwind_reliable case: Idle tasks are special. The final
> +	 * back-chain points to nodat_stack.  See CALL_ON_STACK() in
> +	 * smp_start_secondary() callback used in __cpu_up(). We just
> +	 * accept it and look for pt_regs.
> +	 */
> +	if (likely(sp) &&
> +	    (!unwind_reliable || !(is_idle_task(state->task) &&
> +				   outside_of_stack(state, sp))))
>  		return unwind_use_frame(state, sp, unwind_reliable);
This is not needed anymore.

In the end this all boils down to arch_stack_walk_reliable
implementation. I made the following changes compaired to your version:
---
- use plain unwind_for_each_frame
- "state.stack_info.type != STACK_TYPE_TASK" guarantees that we are
  not leaving task stack.
- "if (state.regs)" guarantees that we have not met an program interrupt
  pt_regs (page faults) or preempted. Corresponds to yours:
> +	if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> +		goto out_err;
- I don't see a point in storing "kernel_thread_starter", am I missing
  something?

diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index 79f323388e4d..fc5419ac64c8 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -36,9 +36,12 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
        struct unwind_state state;
        unsigned long addr;
 
-       for (unwind_start(&state, task, NULL, 0, true);
-            !unwind_done(&state) && !unwind_error(&state);
-            unwind_next_frame(&state, true)) {
+       unwind_for_each_frame(&state, task, NULL, 0) {
+               if (state.stack_info.type != STACK_TYPE_TASK)
+                       return -EINVAL;
+
+               if (state.regs)
+                       return -EINVAL;
 
                addr = unwind_get_return_address(&state);
                if (!addr)
@@ -60,11 +63,5 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
        /* Check for stack corruption */
        if (unwind_error(&state))
                return -EINVAL;
-
-       /* Store kernel_thread_starter, null for swapper/0 */
-       if ((task->flags & (PF_KTHREAD | PF_IDLE)) &&
-           !consume_entry(cookie, state.regs->psw.addr, false))
-               return -EINVAL;
-
        return 0;
 }
--
I'd appreciate if you could give those changes a spin. And check if
something is missing or broken. Or share your livepatching testing
procedure. If everything goes well, I might include these changes in
second pull request for this 5.5 merge window.

I did successfully run ./testing/selftests/livepatch/test-livepatch.sh

https://github.com/lpechacek/qa_test_klp seems outdated. I was able to
fix and run some tests of it but haven't had time to figure out all of
them. Is there a version that would run on top of current Linus tree?

Since I changed your last patch and split it in 2, could you please give
me your Signed-off-by for those 2 commits?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] s390/unwind: add stack pointer alignment sanity checks
  2019-11-29  7:41   ` Vasily Gorbik
@ 2019-11-29  7:41     ` Vasily Gorbik
  2019-11-29 18:16       ` Miroslav Benes
  2019-11-29  7:41     ` [PATCH v4 2/2] s390/livepatch: Implement reliable stack tracing for the consistency model Vasily Gorbik
  2019-11-29 18:16     ` [PATCH v3 4/4] " Miroslav Benes
  2 siblings, 1 reply; 12+ messages in thread
From: Vasily Gorbik @ 2019-11-29  7:41 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: heiko.carstens, borntraeger, jpoimboe, joe.lawrence, linux-s390,
	linux-kernel, jikos, pmladek, nstange, live-patching

From: Miroslav Benes <mbenes@suse.cz>

ABI requires SP to be aligned 8 bytes, report unwinding error otherwise.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
 arch/s390/kernel/dumpstack.c | 4 ++++
 arch/s390/kernel/unwind_bc.c | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index d74e21a23703..d306fe04489a 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -94,6 +94,10 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
 	if (!sp)
 		goto unknown;
 
+	/* Sanity check: ABI requires SP to be aligned 8 bytes. */
+	if (sp & 0x7)
+		goto unknown;
+
 	/* Check per-task stack */
 	if (in_task_stack(sp, task, info))
 		goto recursion_check;
diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index ef42d5f77ce7..da2d4d4c5b0e 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -92,6 +92,10 @@ bool unwind_next_frame(struct unwind_state *state)
 		}
 	}
 
+	/* Sanity check: ABI requires SP to be aligned 8 bytes. */
+	if (sp & 0x7)
+		goto out_err;
+
 	ip = ftrace_graph_ret_addr(state->task, &state->graph_idx, ip, (void *) sp);
 
 	/* Update unwind state */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 2/2] s390/livepatch: Implement reliable stack tracing for the consistency model
  2019-11-29  7:41   ` Vasily Gorbik
  2019-11-29  7:41     ` [PATCH v4 1/2] s390/unwind: add stack pointer alignment sanity checks Vasily Gorbik
@ 2019-11-29  7:41     ` Vasily Gorbik
  2019-11-29 18:16       ` Miroslav Benes
  2019-11-29 18:16     ` [PATCH v3 4/4] " Miroslav Benes
  2 siblings, 1 reply; 12+ messages in thread
From: Vasily Gorbik @ 2019-11-29  7:41 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: heiko.carstens, borntraeger, jpoimboe, joe.lawrence, linux-s390,
	linux-kernel, jikos, pmladek, nstange, live-patching

From: Miroslav Benes <mbenes@suse.cz>

The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.

The "zSeries ELF Application Binary Interface Supplement" says:

  "The stack pointer points to the first word of the lowest allocated
  stack frame. If the "back chain" is implemented this word will point to
  the previously allocated stack frame (towards higher addresses), except
  for the first stack frame, which shall have a back chain of zero (NULL).
  The stack shall grow downwards, in other words towards lower addresses."

"back chain" is optional. GCC option -mbackchain enables it. Quoting
Martin Schwidefsky [1]:

  "The compiler is called with the -mbackchain option, all normal C
  function will store the backchain in the function prologue. All
  functions written in assembler code should do the same, if you find one
  that does not we should fix that. The end result is that a task that
  *voluntarily* called schedule() should have a proper backchain at all
  times.

  Dependent on the use case this may or may not be enough. Asynchronous
  interrupts may stop the CPU at the beginning of a function, if kernel
  preemption is enabled we can end up with a broken backchain.  The
  production kernels for IBM Z are all compiled *without* kernel
  preemption. So yes, we might get away without the objtool support.

  On a side-note, we do have a line item to implement the ORC unwinder for
  the kernel, that includes the objtool support. Once we have that we can
  drop the -mbackchain option for the kernel build. That gives us a nice
  little performance benefit. I hope that the change from backchain to the
  ORC unwinder will not be too hard to implement in the livepatch tools."

Since -mbackchain is enabled by default when the kernel is compiled, the
call chain backtracing should be currently ensured and objtool should
not be necessary for livepatch purposes.

Regarding the second issue, stack corruptions and non-reliable states
have to be recognized by the unwinder. Mainly it means to detect
preemption or page faults, the end of the task stack must be reached,
return addresses must be valid text addresses and hacks like function
graph tracing and kretprobes must be properly detected.

Unwinding a running task's stack is not a problem, because there is a
livepatch requirement that every checked task is blocked, except for the
current task. Due to that, we can consider a task's kernel/thread stack
only and skip the other stacks.

[1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org]

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
 arch/s390/Kconfig             |  1 +
 arch/s390/kernel/stacktrace.c | 43 +++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 2528eb9d01fb..367a87c5d7b8 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -170,6 +170,7 @@ config S390
 	select HAVE_PERF_EVENTS
 	select HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
+	select HAVE_RELIABLE_STACKTRACE
 	select HAVE_RSEQ
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_VIRT_CPU_ACCOUNTING
diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index f8fc4f8aef9b..fc5419ac64c8 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -9,6 +9,7 @@
 #include <linux/stacktrace.h>
 #include <asm/stacktrace.h>
 #include <asm/unwind.h>
+#include <asm/kprobes.h>
 
 void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 		     struct task_struct *task, struct pt_regs *regs)
@@ -22,3 +23,45 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 			break;
 	}
 }
+
+/*
+ * This function returns an error if it detects any unreliable features of the
+ * stack.  Otherwise it guarantees that the stack trace is reliable.
+ *
+ * If the task is not 'current', the caller *must* ensure the task is inactive.
+ */
+int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
+			     void *cookie, struct task_struct *task)
+{
+	struct unwind_state state;
+	unsigned long addr;
+
+	unwind_for_each_frame(&state, task, NULL, 0) {
+		if (state.stack_info.type != STACK_TYPE_TASK)
+			return -EINVAL;
+
+		if (state.regs)
+			return -EINVAL;
+
+		addr = unwind_get_return_address(&state);
+		if (!addr)
+			return -EINVAL;
+
+#ifdef CONFIG_KPROBES
+		/*
+		 * Mark stacktraces with kretprobed functions on them
+		 * as unreliable.
+		 */
+		if (state.ip == (unsigned long)kretprobe_trampoline)
+			return -EINVAL;
+#endif
+
+		if (!consume_entry(cookie, addr, false))
+			return -EINVAL;
+	}
+
+	/* Check for stack corruption */
+	if (unwind_error(&state))
+		return -EINVAL;
+	return 0;
+}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model
  2019-11-29  7:41   ` Vasily Gorbik
  2019-11-29  7:41     ` [PATCH v4 1/2] s390/unwind: add stack pointer alignment sanity checks Vasily Gorbik
  2019-11-29  7:41     ` [PATCH v4 2/2] s390/livepatch: Implement reliable stack tracing for the consistency model Vasily Gorbik
@ 2019-11-29 18:16     ` " Miroslav Benes
  2 siblings, 0 replies; 12+ messages in thread
From: Miroslav Benes @ 2019-11-29 18:16 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: heiko.carstens, borntraeger, jpoimboe, joe.lawrence, linux-s390,
	linux-kernel, jikos, pmladek, nstange, live-patching, lpechacek

On Fri, 29 Nov 2019, Vasily Gorbik wrote:

> On Wed, Nov 06, 2019 at 10:56:01AM +0100, Miroslav Benes wrote:
> > The livepatch consistency model requires reliable stack tracing
> > architecture support in order to work properly. In order to achieve
> > this, two main issues have to be solved. First, reliable and consistent
> > call chain backtracing has to be ensured. Second, the unwinder needs to
> > be able to detect stack corruptions and return errors.
> 
> I tried to address that in a patch series I pushed here:
> https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/log/?h=livepatch
> 
> It partially includes your changes from this patch which have been split
> in 2 patches:
> s390/unwind: add stack pointer alignment sanity checks
> s390/livepatch: Implement reliable stack tracing for the consistency model
> 
> The primary goal was to make our backchain unwinder reliable itself. And
> suitable for livepatching as is (we extra checks at the caller, see
> below). Besides unwinder changes few things have been improved to avoid
> special handling during unwinding.
> - all tasks now have pt_regs at the bottom of task stack.
> - final backchain always contains 0, no further references to no_dat stack.
> - successful unwinding means reaching pt_regs at the bottom of task stack,
> and unwinder guarantees that unwind_error() reflects that.
> - final pt_regs at the bottom of task stack is never included in unwinding
> results. It never was for user tasks. And kernel tasks cannot return to
> that state anyway (and in some cases pt_regs are empty).
> - unwinder starts unwinding from a reliable state (not "sp" passed as
> an argument).

Great. Thanks for doing that. It all looks good to me and the outcome is 
definitely better. I obviously still had some misconceptions about the 
code and it is much clearer now.

> There is also s390 specific unwinder testing module.
>
> > Idle tasks are a bit special. Their final back chains point to no_dat
> > stacks. See for reference CALL_ON_STACK() in smp_start_secondary()
> > callback used in __cpu_up(). The unwinding is stopped there and it is
> > not considered to be a stack corruption.
> I changed that with commit:
> s390: avoid misusing CALL_ON_STACK for task stack setup
> Now idle tasks are not special, final back chain contains 0.
> 
> > ---
> >  arch/s390/Kconfig             |  1 +
> >  arch/s390/kernel/dumpstack.c  | 11 +++++++
> >  arch/s390/kernel/stacktrace.c | 46 ++++++++++++++++++++++++++
> >  arch/s390/kernel/unwind_bc.c  | 61 +++++++++++++++++++++++++++--------
> >  4 files changed, 106 insertions(+), 13 deletions(-)
> > 
> > --- a/arch/s390/kernel/dumpstack.c
> > +++ b/arch/s390/kernel/dumpstack.c
> > @@ -94,12 +94,23 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
> >  	if (!sp)
> >  		goto unknown;
> >  
> > +	/* Sanity check: ABI requires SP to be aligned 8 bytes. */
> > +	if (sp & 0x7)
> > +		goto unknown;
> > +
> This has been split in a separate commit:
> s390/unwind: add stack pointer alignment sanity checks
> 
> > +	/*
> > +	 * The reliable unwinding should not start on nodat_stack, async_stack
> > +	 * or restart_stack. The task is either current or must be inactive.
> > +	 */
> > +	if (unwind_reliable)
> > +		goto unknown;
> I moved this check in arch_stack_walk_reliable itself. See below.
> 
> >  static bool unwind_use_regs(struct unwind_state *state)
> > @@ -85,10 +94,13 @@ static bool unwind_use_frame(struct unwind_state *state, unsigned long sp,
> >  	struct stack_frame *sf;
> >  	unsigned long ip;
> >  
> > -	if (unlikely(outside_of_stack(state, sp))) {
> > -		if (!update_stack_info(state, sp))
> > -			goto out_err;
> > -	}
> > +	/*
> > +	 * No need to update stack info when unwind_reliable is true. We should
> > +	 * be on a task stack and everything else is an error.
> > +	 */
> > +	if (unlikely(outside_of_stack(state, sp)) &&
> > +	    (unwind_reliable || !update_stack_info(state, sp)))
> > +		goto out_err;
> I moved this check in arch_stack_walk_reliable itself. See below.
> 
> > +	/* Unwind reliable */
> > +	if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> > +		goto out_err;
> I moved this check in arch_stack_walk_reliable itself. See below.
> 
> 
> > @@ -136,8 +162,17 @@ bool unwind_next_frame(struct unwind_state *state, bool unwind_reliable)
> >  	sf = (struct stack_frame *) state->sp;
> >  	sp = READ_ONCE_NOCHECK(sf->back_chain);
> >  
> > -	/* Non-zero back-chain points to the previous frame */
> > -	if (likely(sp))
> > +	/*
> > +	 * Non-zero back-chain points to the previous frame
> > +	 *
> > +	 * unwind_reliable case: Idle tasks are special. The final
> > +	 * back-chain points to nodat_stack.  See CALL_ON_STACK() in
> > +	 * smp_start_secondary() callback used in __cpu_up(). We just
> > +	 * accept it and look for pt_regs.
> > +	 */
> > +	if (likely(sp) &&
> > +	    (!unwind_reliable || !(is_idle_task(state->task) &&
> > +				   outside_of_stack(state, sp))))
> >  		return unwind_use_frame(state, sp, unwind_reliable);
> This is not needed anymore.
> 
> In the end this all boils down to arch_stack_walk_reliable
> implementation. I made the following changes compaired to your version:
> ---
> - use plain unwind_for_each_frame
> - "state.stack_info.type != STACK_TYPE_TASK" guarantees that we are
>   not leaving task stack.
> - "if (state.regs)" guarantees that we have not met an program interrupt
>   pt_regs (page faults) or preempted. Corresponds to yours:
> > +	if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> > +		goto out_err;

Agreed. All this should be equivalent to the changes I made.

> - I don't see a point in storing "kernel_thread_starter", am I missing
>   something?

No, you don't. It was just for the sake of completeness and it is not 
worth it.

[...]
 
> I'd appreciate if you could give those changes a spin. And check if
> something is missing or broken. Or share your livepatching testing
> procedure. If everything goes well, I might include these changes in
> second pull request for this 5.5 merge window.
> 
> I did successfully run ./testing/selftests/livepatch/test-livepatch.sh

'make run_tests' in tools/testing/selftests/livepatch/ would call all the 
tests in the directory. Especially test-callbacks.sh which stresses the 
livepatching even more.
 
> https://github.com/lpechacek/qa_test_klp seems outdated. I was able to
> fix and run some tests of it but haven't had time to figure out all of
> them. Is there a version that would run on top of current Linus tree?

Ah, sorry. I should have mentioned that. The code became outdated with 
recent upstream changes. Libor is working on the updates (CCed).

Anyway, I ran it here and it all works fine.

Tested-by: Miroslav Benes <mbenes@suse.cz>

Thanks a lot
Miroslav

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] s390/unwind: add stack pointer alignment sanity checks
  2019-11-29  7:41     ` [PATCH v4 1/2] s390/unwind: add stack pointer alignment sanity checks Vasily Gorbik
@ 2019-11-29 18:16       ` Miroslav Benes
  0 siblings, 0 replies; 12+ messages in thread
From: Miroslav Benes @ 2019-11-29 18:16 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: heiko.carstens, borntraeger, jpoimboe, joe.lawrence, linux-s390,
	linux-kernel, jikos, pmladek, nstange, live-patching

On Fri, 29 Nov 2019, Vasily Gorbik wrote:

> From: Miroslav Benes <mbenes@suse.cz>
> 
> ABI requires SP to be aligned 8 bytes, report unwinding error otherwise.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

Signed-off-by: Miroslav Benes <mbenes@suse.cz>

M

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] s390/livepatch: Implement reliable stack tracing for the consistency model
  2019-11-29  7:41     ` [PATCH v4 2/2] s390/livepatch: Implement reliable stack tracing for the consistency model Vasily Gorbik
@ 2019-11-29 18:16       ` Miroslav Benes
  0 siblings, 0 replies; 12+ messages in thread
From: Miroslav Benes @ 2019-11-29 18:16 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: heiko.carstens, borntraeger, jpoimboe, joe.lawrence, linux-s390,
	linux-kernel, jikos, pmladek, nstange, live-patching

On Fri, 29 Nov 2019, Vasily Gorbik wrote:

> From: Miroslav Benes <mbenes@suse.cz>
> 
> The livepatch consistency model requires reliable stack tracing
> architecture support in order to work properly. In order to achieve
> this, two main issues have to be solved. First, reliable and consistent
> call chain backtracing has to be ensured. Second, the unwinder needs to
> be able to detect stack corruptions and return errors.
> 
> The "zSeries ELF Application Binary Interface Supplement" says:
> 
>   "The stack pointer points to the first word of the lowest allocated
>   stack frame. If the "back chain" is implemented this word will point to
>   the previously allocated stack frame (towards higher addresses), except
>   for the first stack frame, which shall have a back chain of zero (NULL).
>   The stack shall grow downwards, in other words towards lower addresses."
> 
> "back chain" is optional. GCC option -mbackchain enables it. Quoting
> Martin Schwidefsky [1]:
> 
>   "The compiler is called with the -mbackchain option, all normal C
>   function will store the backchain in the function prologue. All
>   functions written in assembler code should do the same, if you find one
>   that does not we should fix that. The end result is that a task that
>   *voluntarily* called schedule() should have a proper backchain at all
>   times.
> 
>   Dependent on the use case this may or may not be enough. Asynchronous
>   interrupts may stop the CPU at the beginning of a function, if kernel
>   preemption is enabled we can end up with a broken backchain.  The
>   production kernels for IBM Z are all compiled *without* kernel
>   preemption. So yes, we might get away without the objtool support.
> 
>   On a side-note, we do have a line item to implement the ORC unwinder for
>   the kernel, that includes the objtool support. Once we have that we can
>   drop the -mbackchain option for the kernel build. That gives us a nice
>   little performance benefit. I hope that the change from backchain to the
>   ORC unwinder will not be too hard to implement in the livepatch tools."
> 
> Since -mbackchain is enabled by default when the kernel is compiled, the
> call chain backtracing should be currently ensured and objtool should
> not be necessary for livepatch purposes.
> 
> Regarding the second issue, stack corruptions and non-reliable states
> have to be recognized by the unwinder. Mainly it means to detect
> preemption or page faults, the end of the task stack must be reached,
> return addresses must be valid text addresses and hacks like function
> graph tracing and kretprobes must be properly detected.
> 
> Unwinding a running task's stack is not a problem, because there is a
> livepatch requirement that every checked task is blocked, except for the
> current task. Due to that, we can consider a task's kernel/thread stack
> only and skip the other stacks.
> 
> [1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org]
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

Signed-off-by: Miroslav Benes <mbenes@suse.cz>

M

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, back to index

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-06  9:55 [PATCH v3 0/4] s390/livepatch: Implement reliable stack tracing for the consistency model Miroslav Benes
2019-11-06  9:55 ` [PATCH v3 1/4] s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr() Miroslav Benes
2019-11-28 16:51   ` Vasily Gorbik
2019-11-06  9:55 ` [PATCH v3 2/4] s390/unwind: split unwind_next_frame() to several functions Miroslav Benes
2019-11-06  9:56 ` [PATCH v3 3/4] s390/unwind: prepare the unwinding interface for reliable stack traces Miroslav Benes
2019-11-06  9:56 ` [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model Miroslav Benes
2019-11-29  7:41   ` Vasily Gorbik
2019-11-29  7:41     ` [PATCH v4 1/2] s390/unwind: add stack pointer alignment sanity checks Vasily Gorbik
2019-11-29 18:16       ` Miroslav Benes
2019-11-29  7:41     ` [PATCH v4 2/2] s390/livepatch: Implement reliable stack tracing for the consistency model Vasily Gorbik
2019-11-29 18:16       ` Miroslav Benes
2019-11-29 18:16     ` [PATCH v3 4/4] " Miroslav Benes

Live-Patching Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/live-patching/0 live-patching/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 live-patching live-patching/ https://lore.kernel.org/live-patching \
		live-patching@vger.kernel.org
	public-inbox-index live-patching

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.live-patching


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git