All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolai Stange <nstange@suse.de>
To: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	live-patching@vger.kernel.org, Torsten Duwe <duwe@lst.de>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Jiri Kosina <jkosina@suse.cz>,
	Balbir Singh <bsingharora@gmail.com>
Subject: Re: ppc64le reliable stack unwinder and scheduled tasks
Date: Mon, 14 Jan 2019 08:21:40 +0100	[thread overview]
Message-ID: <877ef7bt6j.fsf@suse.de> (raw)
In-Reply-To: <20190114040937.GA6739@redhat.com> (Joe Lawrence's message of "Sun, 13 Jan 2019 23:09:37 -0500")

Joe Lawrence <joe.lawrence@redhat.com> writes:

> On Fri, Jan 11, 2019 at 08:51:54AM +0100, Nicolai Stange wrote:
>> Joe Lawrence <joe.lawrence@redhat.com> writes:
>> 
>> > On Fri, Jan 11, 2019 at 01:00:38AM +0100, Nicolai Stange wrote:
>
> [ ... snip ... ]
>
>> >> For testing, could you try whether clearing the word at STACK_FRAME_MARKER
>> >> from _switch() helps?
>> >> 
>> >> I.e. something like (completely untested):
>> >
>> > I'll kick off some builds tonight and try to get tests lined up
>> > tomorrow.  Unfortunately these take a bit of time to run setup, schedule
>> > and complete, so perhaps by next week.
>> 
>> Ok, that's probably still a good test for confirmation, but the solution
>> you proposed below is much better.
>
> Good news, 40 tests (RHEL-7 backport) with clearing out the
> STACK_FRAME_MARKER were all successful.  Previously I would see stalled
> livepatch transitions due to the unreliable report in ~10% of test
> cases.
>
>> >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
>> >> index 435927f549c4..b747d0647ec4 100644
>> >> --- a/arch/powerpc/kernel/entry_64.S
>> >> +++ b/arch/powerpc/kernel/entry_64.S
>> >> @@ -596,6 +596,10 @@ _GLOBAL(_switch)
>> >>  	SAVE_8GPRS(14, r1)
>> >>  	SAVE_10GPRS(22, r1)
>> >>  	std	r0,_NIP(r1)	/* Return to switch caller */
>> >> +
>> >> +	li	r23,0
>> >> +	std	r23,96(r1)	/* 96 == STACK_FRAME_MARKER * sizeof(long) */
>> >> +
>> >>  	mfcr	r23
>> >>  	std	r23,_CCR(r1)
>> >>  	std	r1,KSP(r3)	/* Set old stack pointer */
>> >> 
>> >> 
>> >
>> > This may be sufficient to avoid the condition, but if the contents of
>> > frame 0 are truely uninitialized (not to be trusted), should the
>> > unwinder be even looking at that frame (for STACK_FRAMES_REGS_MARKER),
>> > aside from the LR and other stack size geometry sanity checks?
>> 
>> That's a very good point: we'll only ever be examining scheduled out tasks
>> and current (which at that time is running klp_try_complete_transition()).
>> 
>> current won't be in an interrupt/exception when it's walking its own
>> stack. And scheduled out tasks can't have an exception/interrupt frame
>> as their frame 0, correct?
>> 
>> Thus, AFAICS, whenever klp_try_complete_transition() finds a
>> STACK_FRAMES_REGS_MARKER in frame 0, it is known to be garbage, as you
>> said.
>
> I gave this about 20 tests and they were also all successful, what do
> you think?  (The log could probably use some trimming and cleanup.)  I
> think either solution would fix the issue.
>
> -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8--
>
> From edd3fb9e8a16d5a7ccc98759e9397f386f0e82ca Mon Sep 17 00:00:00 2001
> From: Joe Lawrence <joe.lawrence@redhat.com>
> Date: Fri, 11 Jan 2019 10:25:26 -0500
> Subject: [PATCH] powerpc/livepatch: relax reliable stack tracer exception
>  check
>
> The ppc64le reliable stack tracer iterates over a given task's stack,
> verifying a set of conditions to determine whether the trace is
> 'reliable'.  These checks include the presence of any exception frames.
>
> We should be careful when inspecting the bottom-most stack frame (the
> first to be unwound), particularly for scheduled-out tasks.  As Nicolai
> Stange explains, "If I'm reading the code in _switch() correctly, the
> first frame is completely uninitialized except for the pointer back to
> the caller's stack frame."  If a previous do_IRQ() invocation, for
> example, has left a residual exception-marker on the first frame, the
> stack tracer would incorrectly report this task's trace as unreliable.
>

FWIW, it's not been do_IRQ() who wrote the exception marker, but it's
caller hardware_interrupt_common(), more specifically the
EXCEPTION_PROLOG_COMMON_3 part therof.


> save_stack_trace_tsk_reliable() already skips the first frame when
> verifying the saved Link Register.  It should do the same when looking
> for the STACK_FRAME_REGS_MARKER.  The task it is unwinding will be
> either 'current', in which case the tracer will have never been called
> from an exception path, or the task will be inactive and its first
> frame's contents will be uninitialized (aside from backchain pointer).

I thought about this a little more and can't see anything that would
prevent higher, i.e. non-_switch() frames to also alias with prior
exception frames. That STACK_FRAME_REGS_MARKER is written to a stack
frame's "parameter area" and most functions probably don't initialize
this either. So, AFAICS, higher stack frames could potentially be
affected by the very same problem.

I think the best solution would be to clear the STACK_FRAME_REGS_MARKER
upon exception return. I have a patch ready for that and will post it
after it has passed some basic testing -- hopefully later this day.

That being said, I still think that your patch should also get applied
in some form -- looking at unitialized memory is just not a good thing
to do.


>
> Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model")
> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
> ---
>  arch/powerpc/kernel/stacktrace.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
> index e2c50b55138f..0793e75c45a6 100644
> --- a/arch/powerpc/kernel/stacktrace.c
> +++ b/arch/powerpc/kernel/stacktrace.c
> @@ -84,6 +84,12 @@ save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
>  EXPORT_SYMBOL_GPL(save_stack_trace_regs);
>  
>  #ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +/*
> + * This function returns an error if it detects any unreliable features of the
> + * stack.  Otherwise it guarantees that the stack trace is reliable.
> + *
> + * If the task is not 'current', the caller *must* ensure the task is inactive.
> + */
>  int
>  save_stack_trace_tsk_reliable(struct task_struct *tsk,
>  				struct stack_trace *trace)
> @@ -143,7 +149,7 @@ save_stack_trace_tsk_reliable(struct task_struct *tsk,
>  			return 1;
>  
>  		/* Mark stacktraces with exception frames as unreliable. */
> -		if (sp <= stack_end - STACK_INT_FRAME_SIZE &&
> +		if (!firstframe && sp <= stack_end - STACK_INT_FRAME_SIZE &&
>  		    stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
>  			return 1;
>  		}

I would perhaps not limit this to the STACK_FRAME_REGS_MARKER, but also
not emit the ip obtained from the first frame into the resulting trace.

I.e., how about moving all the sp/newsp handling to the beginning of the
loop and doing an 'if (firstframe) continue;' right after that?

Thanks,

Nicolai


-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

WARNING: multiple messages have this Message-ID (diff)
From: Nicolai Stange <nstange@suse.de>
To: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>,
	linux-kernel@vger.kernel.org, Torsten Duwe <duwe@lst.de>,
	Jiri Kosina <jkosina@suse.cz>,
	live-patching@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: ppc64le reliable stack unwinder and scheduled tasks
Date: Mon, 14 Jan 2019 08:21:40 +0100	[thread overview]
Message-ID: <877ef7bt6j.fsf@suse.de> (raw)
In-Reply-To: <20190114040937.GA6739@redhat.com> (Joe Lawrence's message of "Sun, 13 Jan 2019 23:09:37 -0500")

Joe Lawrence <joe.lawrence@redhat.com> writes:

> On Fri, Jan 11, 2019 at 08:51:54AM +0100, Nicolai Stange wrote:
>> Joe Lawrence <joe.lawrence@redhat.com> writes:
>> 
>> > On Fri, Jan 11, 2019 at 01:00:38AM +0100, Nicolai Stange wrote:
>
> [ ... snip ... ]
>
>> >> For testing, could you try whether clearing the word at STACK_FRAME_MARKER
>> >> from _switch() helps?
>> >> 
>> >> I.e. something like (completely untested):
>> >
>> > I'll kick off some builds tonight and try to get tests lined up
>> > tomorrow.  Unfortunately these take a bit of time to run setup, schedule
>> > and complete, so perhaps by next week.
>> 
>> Ok, that's probably still a good test for confirmation, but the solution
>> you proposed below is much better.
>
> Good news, 40 tests (RHEL-7 backport) with clearing out the
> STACK_FRAME_MARKER were all successful.  Previously I would see stalled
> livepatch transitions due to the unreliable report in ~10% of test
> cases.
>
>> >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
>> >> index 435927f549c4..b747d0647ec4 100644
>> >> --- a/arch/powerpc/kernel/entry_64.S
>> >> +++ b/arch/powerpc/kernel/entry_64.S
>> >> @@ -596,6 +596,10 @@ _GLOBAL(_switch)
>> >>  	SAVE_8GPRS(14, r1)
>> >>  	SAVE_10GPRS(22, r1)
>> >>  	std	r0,_NIP(r1)	/* Return to switch caller */
>> >> +
>> >> +	li	r23,0
>> >> +	std	r23,96(r1)	/* 96 == STACK_FRAME_MARKER * sizeof(long) */
>> >> +
>> >>  	mfcr	r23
>> >>  	std	r23,_CCR(r1)
>> >>  	std	r1,KSP(r3)	/* Set old stack pointer */
>> >> 
>> >> 
>> >
>> > This may be sufficient to avoid the condition, but if the contents of
>> > frame 0 are truely uninitialized (not to be trusted), should the
>> > unwinder be even looking at that frame (for STACK_FRAMES_REGS_MARKER),
>> > aside from the LR and other stack size geometry sanity checks?
>> 
>> That's a very good point: we'll only ever be examining scheduled out tasks
>> and current (which at that time is running klp_try_complete_transition()).
>> 
>> current won't be in an interrupt/exception when it's walking its own
>> stack. And scheduled out tasks can't have an exception/interrupt frame
>> as their frame 0, correct?
>> 
>> Thus, AFAICS, whenever klp_try_complete_transition() finds a
>> STACK_FRAMES_REGS_MARKER in frame 0, it is known to be garbage, as you
>> said.
>
> I gave this about 20 tests and they were also all successful, what do
> you think?  (The log could probably use some trimming and cleanup.)  I
> think either solution would fix the issue.
>
> -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8--
>
> From edd3fb9e8a16d5a7ccc98759e9397f386f0e82ca Mon Sep 17 00:00:00 2001
> From: Joe Lawrence <joe.lawrence@redhat.com>
> Date: Fri, 11 Jan 2019 10:25:26 -0500
> Subject: [PATCH] powerpc/livepatch: relax reliable stack tracer exception
>  check
>
> The ppc64le reliable stack tracer iterates over a given task's stack,
> verifying a set of conditions to determine whether the trace is
> 'reliable'.  These checks include the presence of any exception frames.
>
> We should be careful when inspecting the bottom-most stack frame (the
> first to be unwound), particularly for scheduled-out tasks.  As Nicolai
> Stange explains, "If I'm reading the code in _switch() correctly, the
> first frame is completely uninitialized except for the pointer back to
> the caller's stack frame."  If a previous do_IRQ() invocation, for
> example, has left a residual exception-marker on the first frame, the
> stack tracer would incorrectly report this task's trace as unreliable.
>

FWIW, it's not been do_IRQ() who wrote the exception marker, but it's
caller hardware_interrupt_common(), more specifically the
EXCEPTION_PROLOG_COMMON_3 part therof.


> save_stack_trace_tsk_reliable() already skips the first frame when
> verifying the saved Link Register.  It should do the same when looking
> for the STACK_FRAME_REGS_MARKER.  The task it is unwinding will be
> either 'current', in which case the tracer will have never been called
> from an exception path, or the task will be inactive and its first
> frame's contents will be uninitialized (aside from backchain pointer).

I thought about this a little more and can't see anything that would
prevent higher, i.e. non-_switch() frames to also alias with prior
exception frames. That STACK_FRAME_REGS_MARKER is written to a stack
frame's "parameter area" and most functions probably don't initialize
this either. So, AFAICS, higher stack frames could potentially be
affected by the very same problem.

I think the best solution would be to clear the STACK_FRAME_REGS_MARKER
upon exception return. I have a patch ready for that and will post it
after it has passed some basic testing -- hopefully later this day.

That being said, I still think that your patch should also get applied
in some form -- looking at unitialized memory is just not a good thing
to do.


>
> Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model")
> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
> ---
>  arch/powerpc/kernel/stacktrace.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
> index e2c50b55138f..0793e75c45a6 100644
> --- a/arch/powerpc/kernel/stacktrace.c
> +++ b/arch/powerpc/kernel/stacktrace.c
> @@ -84,6 +84,12 @@ save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
>  EXPORT_SYMBOL_GPL(save_stack_trace_regs);
>  
>  #ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +/*
> + * This function returns an error if it detects any unreliable features of the
> + * stack.  Otherwise it guarantees that the stack trace is reliable.
> + *
> + * If the task is not 'current', the caller *must* ensure the task is inactive.
> + */
>  int
>  save_stack_trace_tsk_reliable(struct task_struct *tsk,
>  				struct stack_trace *trace)
> @@ -143,7 +149,7 @@ save_stack_trace_tsk_reliable(struct task_struct *tsk,
>  			return 1;
>  
>  		/* Mark stacktraces with exception frames as unreliable. */
> -		if (sp <= stack_end - STACK_INT_FRAME_SIZE &&
> +		if (!firstframe && sp <= stack_end - STACK_INT_FRAME_SIZE &&
>  		    stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
>  			return 1;
>  		}

I would perhaps not limit this to the STACK_FRAME_REGS_MARKER, but also
not emit the ip obtained from the first frame into the resulting trace.

I.e., how about moving all the sp/newsp handling to the beginning of the
loop and doing an 'if (firstframe) continue;' right after that?

Thanks,

Nicolai


-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

  reply	other threads:[~2019-01-14  7:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-10 21:14 ppc64le reliable stack unwinder and scheduled tasks Joe Lawrence
2019-01-10 21:14 ` Joe Lawrence
2019-01-11  0:00 ` Nicolai Stange
2019-01-11  0:00   ` Nicolai Stange
2019-01-11  1:08   ` Joe Lawrence
2019-01-11  1:08     ` Joe Lawrence
2019-01-11  7:51     ` Nicolai Stange
2019-01-11  7:51       ` Nicolai Stange
2019-01-14  4:09       ` Joe Lawrence
2019-01-14  4:09         ` Joe Lawrence
2019-01-14  7:21         ` Nicolai Stange [this message]
2019-01-14  7:21           ` Nicolai Stange
2019-01-14 16:46           ` Joe Lawrence
2019-01-14 16:46             ` Joe Lawrence
2019-01-14 17:09             ` Josh Poimboeuf
2019-01-14 17:09               ` Josh Poimboeuf
2019-01-14 17:54               ` Joe Lawrence
2019-01-14 17:54                 ` Joe Lawrence
2019-01-12  1:09 ` Balbir Singh
2019-01-12  1:09   ` Balbir Singh
2019-01-12  8:45   ` Segher Boessenkool
2019-01-13 12:33     ` Balbir Singh
2019-01-13 13:05       ` Torsten Duwe
2019-01-13 13:05         ` Torsten Duwe
2019-01-17 14:52       ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877ef7bt6j.fsf@suse.de \
    --to=nstange@suse.de \
    --cc=bsingharora@gmail.com \
    --cc=duwe@lst.de \
    --cc=jkosina@suse.cz \
    --cc=joe.lawrence@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=live-patching@vger.kernel.org \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.