From: Nicolai Stange <nstange@suse.de> To: Joe Lawrence <joe.lawrence@redhat.com> Cc: Nicolai Stange <nstange@suse.de>, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, live-patching@vger.kernel.org, Torsten Duwe <duwe@lst.de>, Michael Ellerman <mpe@ellerman.id.au>, Jiri Kosina <jkosina@suse.cz>, Balbir Singh <bsingharora@gmail.com> Subject: Re: ppc64le reliable stack unwinder and scheduled tasks Date: Mon, 14 Jan 2019 08:21:40 +0100 [thread overview] Message-ID: <877ef7bt6j.fsf@suse.de> (raw) In-Reply-To: <20190114040937.GA6739@redhat.com> (Joe Lawrence's message of "Sun, 13 Jan 2019 23:09:37 -0500") Joe Lawrence <joe.lawrence@redhat.com> writes: > On Fri, Jan 11, 2019 at 08:51:54AM +0100, Nicolai Stange wrote: >> Joe Lawrence <joe.lawrence@redhat.com> writes: >> >> > On Fri, Jan 11, 2019 at 01:00:38AM +0100, Nicolai Stange wrote: > > [ ... snip ... ] > >> >> For testing, could you try whether clearing the word at STACK_FRAME_MARKER >> >> from _switch() helps? >> >> >> >> I.e. something like (completely untested): >> > >> > I'll kick off some builds tonight and try to get tests lined up >> > tomorrow. Unfortunately these take a bit of time to run setup, schedule >> > and complete, so perhaps by next week. >> >> Ok, that's probably still a good test for confirmation, but the solution >> you proposed below is much better. > > Good news, 40 tests (RHEL-7 backport) with clearing out the > STACK_FRAME_MARKER were all successful. Previously I would see stalled > livepatch transitions due to the unreliable report in ~10% of test > cases. > >> >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> >> index 435927f549c4..b747d0647ec4 100644 >> >> --- a/arch/powerpc/kernel/entry_64.S >> >> +++ b/arch/powerpc/kernel/entry_64.S >> >> @@ -596,6 +596,10 @@ _GLOBAL(_switch) >> >> SAVE_8GPRS(14, r1) >> >> SAVE_10GPRS(22, r1) >> >> std r0,_NIP(r1) /* Return to switch caller */ >> >> + >> >> + li r23,0 >> >> + std r23,96(r1) /* 96 == STACK_FRAME_MARKER * sizeof(long) */ >> >> + >> >> mfcr r23 >> >> std r23,_CCR(r1) >> >> std r1,KSP(r3) /* Set old stack pointer */ >> >> >> >> >> > >> > This may be sufficient to avoid the condition, but if the contents of >> > frame 0 are truely uninitialized (not to be trusted), should the >> > unwinder be even looking at that frame (for STACK_FRAMES_REGS_MARKER), >> > aside from the LR and other stack size geometry sanity checks? >> >> That's a very good point: we'll only ever be examining scheduled out tasks >> and current (which at that time is running klp_try_complete_transition()). >> >> current won't be in an interrupt/exception when it's walking its own >> stack. And scheduled out tasks can't have an exception/interrupt frame >> as their frame 0, correct? >> >> Thus, AFAICS, whenever klp_try_complete_transition() finds a >> STACK_FRAMES_REGS_MARKER in frame 0, it is known to be garbage, as you >> said. > > I gave this about 20 tests and they were also all successful, what do > you think? (The log could probably use some trimming and cleanup.) I > think either solution would fix the issue. > > -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- > > From edd3fb9e8a16d5a7ccc98759e9397f386f0e82ca Mon Sep 17 00:00:00 2001 > From: Joe Lawrence <joe.lawrence@redhat.com> > Date: Fri, 11 Jan 2019 10:25:26 -0500 > Subject: [PATCH] powerpc/livepatch: relax reliable stack tracer exception > check > > The ppc64le reliable stack tracer iterates over a given task's stack, > verifying a set of conditions to determine whether the trace is > 'reliable'. These checks include the presence of any exception frames. > > We should be careful when inspecting the bottom-most stack frame (the > first to be unwound), particularly for scheduled-out tasks. As Nicolai > Stange explains, "If I'm reading the code in _switch() correctly, the > first frame is completely uninitialized except for the pointer back to > the caller's stack frame." If a previous do_IRQ() invocation, for > example, has left a residual exception-marker on the first frame, the > stack tracer would incorrectly report this task's trace as unreliable. > FWIW, it's not been do_IRQ() who wrote the exception marker, but it's caller hardware_interrupt_common(), more specifically the EXCEPTION_PROLOG_COMMON_3 part therof. > save_stack_trace_tsk_reliable() already skips the first frame when > verifying the saved Link Register. It should do the same when looking > for the STACK_FRAME_REGS_MARKER. The task it is unwinding will be > either 'current', in which case the tracer will have never been called > from an exception path, or the task will be inactive and its first > frame's contents will be uninitialized (aside from backchain pointer). I thought about this a little more and can't see anything that would prevent higher, i.e. non-_switch() frames to also alias with prior exception frames. That STACK_FRAME_REGS_MARKER is written to a stack frame's "parameter area" and most functions probably don't initialize this either. So, AFAICS, higher stack frames could potentially be affected by the very same problem. I think the best solution would be to clear the STACK_FRAME_REGS_MARKER upon exception return. I have a patch ready for that and will post it after it has passed some basic testing -- hopefully later this day. That being said, I still think that your patch should also get applied in some form -- looking at unitialized memory is just not a good thing to do. > > Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model") > Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> > --- > arch/powerpc/kernel/stacktrace.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c > index e2c50b55138f..0793e75c45a6 100644 > --- a/arch/powerpc/kernel/stacktrace.c > +++ b/arch/powerpc/kernel/stacktrace.c > @@ -84,6 +84,12 @@ save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace) > EXPORT_SYMBOL_GPL(save_stack_trace_regs); > > #ifdef CONFIG_HAVE_RELIABLE_STACKTRACE > +/* > + * This function returns an error if it detects any unreliable features of the > + * stack. Otherwise it guarantees that the stack trace is reliable. > + * > + * If the task is not 'current', the caller *must* ensure the task is inactive. > + */ > int > save_stack_trace_tsk_reliable(struct task_struct *tsk, > struct stack_trace *trace) > @@ -143,7 +149,7 @@ save_stack_trace_tsk_reliable(struct task_struct *tsk, > return 1; > > /* Mark stacktraces with exception frames as unreliable. */ > - if (sp <= stack_end - STACK_INT_FRAME_SIZE && > + if (!firstframe && sp <= stack_end - STACK_INT_FRAME_SIZE && > stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) { > return 1; > } I would perhaps not limit this to the STACK_FRAME_REGS_MARKER, but also not emit the ip obtained from the first frame into the resulting trace. I.e., how about moving all the sp/newsp handling to the beginning of the loop and doing an 'if (firstframe) continue;' right after that? Thanks, Nicolai -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
WARNING: multiple messages have this Message-ID (diff)
From: Nicolai Stange <nstange@suse.de> To: Joe Lawrence <joe.lawrence@redhat.com> Cc: Nicolai Stange <nstange@suse.de>, linux-kernel@vger.kernel.org, Torsten Duwe <duwe@lst.de>, Jiri Kosina <jkosina@suse.cz>, live-patching@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: Re: ppc64le reliable stack unwinder and scheduled tasks Date: Mon, 14 Jan 2019 08:21:40 +0100 [thread overview] Message-ID: <877ef7bt6j.fsf@suse.de> (raw) In-Reply-To: <20190114040937.GA6739@redhat.com> (Joe Lawrence's message of "Sun, 13 Jan 2019 23:09:37 -0500") Joe Lawrence <joe.lawrence@redhat.com> writes: > On Fri, Jan 11, 2019 at 08:51:54AM +0100, Nicolai Stange wrote: >> Joe Lawrence <joe.lawrence@redhat.com> writes: >> >> > On Fri, Jan 11, 2019 at 01:00:38AM +0100, Nicolai Stange wrote: > > [ ... snip ... ] > >> >> For testing, could you try whether clearing the word at STACK_FRAME_MARKER >> >> from _switch() helps? >> >> >> >> I.e. something like (completely untested): >> > >> > I'll kick off some builds tonight and try to get tests lined up >> > tomorrow. Unfortunately these take a bit of time to run setup, schedule >> > and complete, so perhaps by next week. >> >> Ok, that's probably still a good test for confirmation, but the solution >> you proposed below is much better. > > Good news, 40 tests (RHEL-7 backport) with clearing out the > STACK_FRAME_MARKER were all successful. Previously I would see stalled > livepatch transitions due to the unreliable report in ~10% of test > cases. > >> >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> >> index 435927f549c4..b747d0647ec4 100644 >> >> --- a/arch/powerpc/kernel/entry_64.S >> >> +++ b/arch/powerpc/kernel/entry_64.S >> >> @@ -596,6 +596,10 @@ _GLOBAL(_switch) >> >> SAVE_8GPRS(14, r1) >> >> SAVE_10GPRS(22, r1) >> >> std r0,_NIP(r1) /* Return to switch caller */ >> >> + >> >> + li r23,0 >> >> + std r23,96(r1) /* 96 == STACK_FRAME_MARKER * sizeof(long) */ >> >> + >> >> mfcr r23 >> >> std r23,_CCR(r1) >> >> std r1,KSP(r3) /* Set old stack pointer */ >> >> >> >> >> > >> > This may be sufficient to avoid the condition, but if the contents of >> > frame 0 are truely uninitialized (not to be trusted), should the >> > unwinder be even looking at that frame (for STACK_FRAMES_REGS_MARKER), >> > aside from the LR and other stack size geometry sanity checks? >> >> That's a very good point: we'll only ever be examining scheduled out tasks >> and current (which at that time is running klp_try_complete_transition()). >> >> current won't be in an interrupt/exception when it's walking its own >> stack. And scheduled out tasks can't have an exception/interrupt frame >> as their frame 0, correct? >> >> Thus, AFAICS, whenever klp_try_complete_transition() finds a >> STACK_FRAMES_REGS_MARKER in frame 0, it is known to be garbage, as you >> said. > > I gave this about 20 tests and they were also all successful, what do > you think? (The log could probably use some trimming and cleanup.) I > think either solution would fix the issue. > > -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- > > From edd3fb9e8a16d5a7ccc98759e9397f386f0e82ca Mon Sep 17 00:00:00 2001 > From: Joe Lawrence <joe.lawrence@redhat.com> > Date: Fri, 11 Jan 2019 10:25:26 -0500 > Subject: [PATCH] powerpc/livepatch: relax reliable stack tracer exception > check > > The ppc64le reliable stack tracer iterates over a given task's stack, > verifying a set of conditions to determine whether the trace is > 'reliable'. These checks include the presence of any exception frames. > > We should be careful when inspecting the bottom-most stack frame (the > first to be unwound), particularly for scheduled-out tasks. As Nicolai > Stange explains, "If I'm reading the code in _switch() correctly, the > first frame is completely uninitialized except for the pointer back to > the caller's stack frame." If a previous do_IRQ() invocation, for > example, has left a residual exception-marker on the first frame, the > stack tracer would incorrectly report this task's trace as unreliable. > FWIW, it's not been do_IRQ() who wrote the exception marker, but it's caller hardware_interrupt_common(), more specifically the EXCEPTION_PROLOG_COMMON_3 part therof. > save_stack_trace_tsk_reliable() already skips the first frame when > verifying the saved Link Register. It should do the same when looking > for the STACK_FRAME_REGS_MARKER. The task it is unwinding will be > either 'current', in which case the tracer will have never been called > from an exception path, or the task will be inactive and its first > frame's contents will be uninitialized (aside from backchain pointer). I thought about this a little more and can't see anything that would prevent higher, i.e. non-_switch() frames to also alias with prior exception frames. That STACK_FRAME_REGS_MARKER is written to a stack frame's "parameter area" and most functions probably don't initialize this either. So, AFAICS, higher stack frames could potentially be affected by the very same problem. I think the best solution would be to clear the STACK_FRAME_REGS_MARKER upon exception return. I have a patch ready for that and will post it after it has passed some basic testing -- hopefully later this day. That being said, I still think that your patch should also get applied in some form -- looking at unitialized memory is just not a good thing to do. > > Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model") > Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> > --- > arch/powerpc/kernel/stacktrace.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c > index e2c50b55138f..0793e75c45a6 100644 > --- a/arch/powerpc/kernel/stacktrace.c > +++ b/arch/powerpc/kernel/stacktrace.c > @@ -84,6 +84,12 @@ save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace) > EXPORT_SYMBOL_GPL(save_stack_trace_regs); > > #ifdef CONFIG_HAVE_RELIABLE_STACKTRACE > +/* > + * This function returns an error if it detects any unreliable features of the > + * stack. Otherwise it guarantees that the stack trace is reliable. > + * > + * If the task is not 'current', the caller *must* ensure the task is inactive. > + */ > int > save_stack_trace_tsk_reliable(struct task_struct *tsk, > struct stack_trace *trace) > @@ -143,7 +149,7 @@ save_stack_trace_tsk_reliable(struct task_struct *tsk, > return 1; > > /* Mark stacktraces with exception frames as unreliable. */ > - if (sp <= stack_end - STACK_INT_FRAME_SIZE && > + if (!firstframe && sp <= stack_end - STACK_INT_FRAME_SIZE && > stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) { > return 1; > } I would perhaps not limit this to the STACK_FRAME_REGS_MARKER, but also not emit the ip obtained from the first frame into the resulting trace. I.e., how about moving all the sp/newsp handling to the beginning of the loop and doing an 'if (firstframe) continue;' right after that? Thanks, Nicolai -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
next prev parent reply other threads:[~2019-01-14 7:21 UTC|newest] Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-01-10 21:14 ppc64le reliable stack unwinder and scheduled tasks Joe Lawrence 2019-01-10 21:14 ` Joe Lawrence 2019-01-11 0:00 ` Nicolai Stange 2019-01-11 0:00 ` Nicolai Stange 2019-01-11 1:08 ` Joe Lawrence 2019-01-11 1:08 ` Joe Lawrence 2019-01-11 7:51 ` Nicolai Stange 2019-01-11 7:51 ` Nicolai Stange 2019-01-14 4:09 ` Joe Lawrence 2019-01-14 4:09 ` Joe Lawrence 2019-01-14 7:21 ` Nicolai Stange [this message] 2019-01-14 7:21 ` Nicolai Stange 2019-01-14 16:46 ` Joe Lawrence 2019-01-14 16:46 ` Joe Lawrence 2019-01-14 17:09 ` Josh Poimboeuf 2019-01-14 17:09 ` Josh Poimboeuf 2019-01-14 17:54 ` Joe Lawrence 2019-01-14 17:54 ` Joe Lawrence 2019-01-12 1:09 ` Balbir Singh 2019-01-12 1:09 ` Balbir Singh 2019-01-12 8:45 ` Segher Boessenkool 2019-01-13 12:33 ` Balbir Singh 2019-01-13 13:05 ` Torsten Duwe 2019-01-13 13:05 ` Torsten Duwe 2019-01-17 14:52 ` Segher Boessenkool
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=877ef7bt6j.fsf@suse.de \ --to=nstange@suse.de \ --cc=bsingharora@gmail.com \ --cc=duwe@lst.de \ --cc=jkosina@suse.cz \ --cc=joe.lawrence@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=live-patching@vger.kernel.org \ --cc=mpe@ellerman.id.au \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.