Re: [RFC PATCH v1 1/1] arm64: Unwinder enhancements for reliable stack trace

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: broonie@kernel.org, jpoimboe@redhat.com, jthierry@redhat.com,
	linux-arm-kernel@lists.infradead.org,
	live-patching@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v1 1/1] arm64: Unwinder enhancements for reliable stack trace
Date: Wed, 24 Feb 2021 13:34:09 -0600	[thread overview]
Message-ID: <4a96b31d-0d16-1f12-fa64-5fdae64cd2d1@linux.microsoft.com> (raw)
In-Reply-To: <20210224121716.GE50741@C02TD0UTHF1T.local>

On 2/24/21 6:17 AM, Mark Rutland wrote:
> Hi Madhavan,
> 
> As Mark Brown says, I think this needs to be split into several
> patches. i have some comments on the general approach, but I'll save
> in-depth review until this has been split.
> 

OK.

> On Tue, Feb 23, 2021 at 12:12:43PM -0600, madvenka@linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>
>> Unwinder changes
>> ================
>>
>> 	Termination
>> 	===========
>>
>> 	Currently, the unwinder terminates when both the FP (frame pointer)
>> 	and the PC (return address) of a frame are 0. But a frame could get
>> 	corrupted and zeroed. There needs to be a better check.
>>
>> 	The following special terminating frame and function have been
>> 	defined for this purpose:
>>
>> 	const u64    arm64_last_frame[2] __attribute__ ((aligned (16)));
>>
>> 	void arm64_last_func(void)
>> 	{
>> 	}
>>
>> 	So, set the FP to arm64_last_frame and the PC to arm64_last_func in
>> 	the bottom most frame.
> 
> My expectation was that we'd do this per-task, creating an empty frame
> record (i.e. with fp=NULL and lr=NULL) on the task's stack at the
> instant it was created, and chaining this into x29. That way the address
> is known (since it can be derived from the task), and the frame will
> also implicitly check that the callchain terminates on the task stack
> without loops. That also means that we can use it to detect the entry
> code going wrong (e.g. if the SP gets corrupted), since in that case the
> entry code would place the record at a different location.
> 

That is exactly what this is doing. arm64_last_frame[] is a marker frame
that contains fp=0 and pc=0.

>>
>> 	Exception/Interrupt detection
>> 	=============================
>>
>> 	An EL1 exception renders the stack trace unreliable as it can happen
>> 	anywhere including the frame pointer prolog and epilog. The
>> 	unwinder needs to be able to detect the exception on the stack.
>>
>> 	Currently, the EL1 exception handler sets up pt_regs on the stack
>> 	and chains pt_regs->stackframe with the other frames on the stack.
>> 	But, the unwinder does not know where this exception frame is in
>> 	the stack trace.
>>
>> 	Set the LSB of the exception frame FP to allow the unwinder to
>> 	detect the exception frame. When the unwinder detects the frame,
>> 	it needs to make sure that it is really an exception frame and
>> 	not the result of any stack corruption.
> 
> I'm not keen on messing with the encoding of the frame record as this
> will break external unwinders (e.g. using GDB on a kernel running under
> QEMU). I'd rather that we detected the exception boundary based on the
> LR, similar to what we did in commit:
> 

OK. I will take a look at the commit you mentioned.

>   7326749801396105 ("arm64: unwind: reference pt_regs via embedded stack frame")
> 
> ... I reckon once we've moved the last of the exception triage out to C
> this will be relatively simple, since all of the exception handlers will
> look like:
> 
> | SYM_CODE_START_LOCAL(elX_exception)
> | 	kernel_entry X
> | 	mov	x0, sp
> | 	bl	elX_exception_handler
> | 	kernel_exit X
> | SYM_CODE_END(elX_exception)
> 
> ... and so we just need to identify the set of elX_exception functions
> (which we'll never expect to take exceptions from directly). We could be
> strict and reject unwinding into arbitrary bits of the entry code (e.g.
> if we took an unexpected exception), and only permit unwinding to the
> BL.
> 
>> 	It can do this if the FP and PC are also recorded elsewhere in the
>> 	pt_regs for comparison. Currently, the FP is also stored in
>> 	regs->regs[29]. The PC is stored in regs->pc. However, regs->pc can
>> 	be changed by lower level functions.
>>
>> 	Create a new field, pt_regs->orig_pc, and record the return address
>> 	PC there. With this, the unwinder can validate the exception frame
>> 	and set a flag so that the caller of the unwinder can know when
>> 	an exception frame is encountered.
> 
> I don't understand the case you're trying to solve here. When is
> regs->pc changed in a way that's problematic?
> 

For instance, I used a test driver in which the driver calls a function
pointer which is NULL. The low level fault handler sends a signal to the
task. Looks like it changes regs->pc for this. When I dump the stack
from the low level handler, the comparison with regs->pc does not work.
But comparison with regs->orig_pc works.

>> 	Unwinder return value
>> 	=====================
>>
>> 	Currently, the unwinder returns -EINVAL for stack trace termination
>> 	as well as stack trace error. Return -ENOENT for stack trace
>> 	termination and -EINVAL for error to disambiguate. This idea has
>> 	been borrowed from Mark Brown.
> 
> IIRC Mark Brown already has a patch for this (and it could be queued on
> its own if it hasn't already been).
> 

I saw it. That is fine.

Thanks.

Madhavan