On Tue, Feb 23, 2021 at 12:12:43PM -0600, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> Unwinder changes
> ================

This is making several different changes so should be split into a patch
series - for example the change to terminate on a specific function
pointer rather than NULL and the changes to the exception/interupt
detection should be split.  Please see submitting-patches.rst for some
discussion about how to split things up.  In general if you've got a
changelog enumerating a number of different changes in a patch that's a
warning sign that it might be good split things up.

You should also copy the architecture maintainers (Catalin and Will) on
any arch/arm64 submissions.

> 	Unwinder return value
> 	=====================
> 
> 	Currently, the unwinder returns -EINVAL for stack trace termination
> 	as well as stack trace error. Return -ENOENT for stack trace
> 	termination and -EINVAL for error to disambiguate. This idea has
> 	been borrowed from Mark Brown.

You could just include my patch for this in your series.

> Reliable stack trace function
> =============================
> 
> Implement arch_stack_walk_reliable(). This function walks the stack like
> the existing stack trace functions with a couple of additional checks:
> 
> 	Return address check
> 	--------------------
> 
> 	For each frame, check the return address to see if it is a
> 	proper kernel text address. If not, return -EINVAL.
> 
> 	Exception frame check
> 	---------------------
> 
> 	Check each frame to see if it is an EL1 exception frame. If it is,
> 	return -EINVAL.

Again, this should be at least one separate patch.  How does this ensure
that we don't have any issues with any of the various probe mechanisms?
If there's no need to explicitly check anything that should be called
out in the changelog.

Since all these changes are mixed up this is a fairly superficial
review of the actual code.

> +static notrace struct pt_regs *get_frame_regs(struct task_struct *task,
> +					      struct stackframe *frame)
> +{
> +	unsigned long stackframe, regs_start, regs_end;
> +	struct stack_info info;
> +
> +	stackframe = frame->prev_fp;
> +	if (!stackframe)
> +		return NULL;
> +
> +	(void) on_accessible_stack(task, stackframe, &info);

Shouldn't we return NULL if we are not on an accessible stack?

> +static notrace int update_frame(struct task_struct *task,
> +				struct stackframe *frame)

This function really needs some documentation, the function is just
called update_frame() which doesn't say what sort of updates it's
supposed to do and most of the checks aren't explained, not all of them
are super obvious.

> +{
> +	unsigned long lsb = frame->fp & 0xf;
> +	unsigned long fp = frame->fp & ~lsb;
> +	unsigned long pc = frame->pc;
> +	struct pt_regs *regs;
> +
> +	frame->exception_frame = false;
> +
> +	if (fp == (unsigned long) arm64_last_frame &&
> +	    pc == (unsigned long) arm64_last_func)
> +		return -ENOENT;
> +
> +	if (!lsb)
> +		return 0;
> +	if (lsb != 1)
> +		return -EINVAL;
> +
> +	/*
> +	 * This looks like an EL1 exception frame.

For clarity it would be good to spell out the properties of an EL1
exception frame.  It is not clear to me why we don't reference the frame
type information the unwinder already records as part of these checks.

In general, especially for the bits specific to reliable stack trace, I
think we want to err on the side of verbosity here so that it is crystal
clear what all the checks are supposed to be doing and it's that much
easier to tie everything through to the requirements document.