Re: [RFC PATCH 0/3] arm64: Implement reliable stack trace

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
To: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Julien Thierry <jthierry@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Mark Brown <broonie@kernel.org>, Miroslav Benes <mbenes@suse.cz>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH 0/3] arm64: Implement reliable stack trace
Date: Mon, 1 Feb 2021 20:29:52 -0600	[thread overview]
Message-ID: <44a32a13-f744-b591-a68d-19cf665e5495@linux.microsoft.com> (raw)
In-Reply-To: <20210201230032.syuv2nrbbureszbu@treble>

On 2/1/21 5:00 PM, Josh Poimboeuf wrote:
> On Mon, Feb 01, 2021 at 03:38:53PM -0600, Madhavan T. Venkataraman wrote:
>> So, I have a few questions from a livepatch perspective.
>>
>> For livepatch, the kernel makes sure that task is not running when its stack is checked,
>> correct?
> 
> Correct.
> 
>> The only possibility I can think of is that the task could have received an
>> interrupt and could have been preempted at the end of the interrupt. The interrupt
>> could have happened during the frame pointer prolog or epilog. Is this the problem case
>> for livepatch?
>>
>> If the unwinder could check a flag in the task that indicates that the task was interrupted,
>> the unwinder could declare the stack trace unreliable. E.g., a (hacky) solution could
>> be to set and clear the flag in preempt_schedule_irq() which takes a task off a CPU
>> when it is preempted at the end of an interrupt. The flag would remain set while the task is not
>> on a CPU.
>>
>> Similarly, for exceptions, can we set a flag in a task indicating that it is processing
>> an exception? Is there a top level exception handler where we can do this? Is there common
>> code that exception handlers use where we can set this? Or, can we deduce this from ptregs->pstate
>> that is saved for the task?
>>
>> Mind you, the flag is advisory. If the unwinder has some way to unwind through an exception,
>> more power to it.
> 
> For x86 (frame pointers), entry code uses ENCODE_FRAME_POINTER, which
> creates a special pt_regs frame.
> 
> When the reliable unwinder sees the encoded regs on the stack, it knows
> it encountered some asynchronous event, like preemption, and it marks
> the stack unreliable.
> 

Ok. Good to know.

>>> Given that, I think that assuming we must use a shadow stack for
>>> reliable unwinding would be jumping the gun.
>>>
>>
>> So, this is the problem I was considering. Let us say that a function properly sets up the
>> frame pointer at the beginning and properly restores it to the previous value when it
>> returns. But because of compiler bugs or some inline assembly code or other errant code,
>> the frame pointer gets modified in the middle of the function. Then, the function calls
>> another function. Then, the unwinder tries to unwind the stack. The unwinder has no
>> way of knowing that the frame pointer was modified. To tackle this problem, Objtool
>> has to laboriously walk all the code paths and track every modification to the stack and
>> the frame pointer. And, if there are frame modifications, it has to fail the kernel build.
>> Did I understand it correctly?
> 
> Yes, though it generally warns instead of failing the build.  But we
> keep the warnings to zero as best we can.
> 

ok.

> BTW, the most common inline asm frame pointer bug we saw on x86 was a
> call instruction which got inserted by GCC before the prologue -- or
> sometimes there was no prologue because it was otherwise considered a
> leaf function.
> 

ok.

>> In these cases, the shadow stack can be used to unwind the stack. The shadow stack has
>> return addresses pushed on it. For livepatch purposes, this good enough.
> 
> We try to fix every warning.  For the few warnings we whitelist instead
> of fixing, we make sure it's not a risk for live patching.
> 

ok.

>>>> Objtool will check for the no-ops. If they are present, it will replace the no-ops with
>>>> the shadow stack prolog and epilog. It can also check the frame pointer prolog and
>>>> epilog.
>>>
>>> I suspect this will interact poorly with patchable-function-entry, which
>>> prefixes each instrumentable function with some NOPs.
>>>
>>
>> Objtool knows if the kernel was configured with tracing. The compiler inserts a fixed,
>> known number of no-ops for tracing purposes. So, why is it difficult for objtool to
>> find the prolog/epilog no-ops?
> 
> Objtool tries to stay out of the code generation business.  Because then
> who's going to validate objtool's code :-)
> 
> And the compiler already does a decent job at generating it.
> 

The no-ops were just a wild idea of mine. I knew I would rue it as soon as I hit
the send button :-)

>>> I think at this point, we haven't gained anything from using a shadow
>>> stack, and I'd much rather we used objtool to gather any metadata needed
>>> to make unwinding reliable without mandating a shadow stack.
>>>
>>
>> I think we have gained something. Pushing the return addresses on the shadow stack
>> and using them to unwind means that objtool does not have to decode every single
>> instruction and track the changes to the stack and frame state. It also means
>> that the kernel build does not have to be failed when some frame modification is
>> detected by objtool.
> 
> How do we know the kernel has full and accurate CFI coverage?
> 

I need to study this before I can answer.

> The original version of objtool was an awk script which basically just
> crudely looked for the prologue/epilogue instructions.  It mostly
> worked.
> 
> But it wasn't 100%, and these days the prologue isn't always at the
> beginning, and the epilogue is usually buried in the middle.  And
> sometimes there are more stack pushes/pops hidden outside of the formal
> prologue/epilogue.  Not to mention asm code which does all kinds of
> crazy things.  And other edge cases, like leaf functions which don't
> require frame pointers, and alternatives patching/paravirt/etc which can
> muck with the stack layout at runtime.  Eventually we realized a "full
> coverage" objtool is the wisest approach.
> 

Yes. I studied the objtool code. It does a fantastic job for X86. I suspect
it took a lot of time and a lot of work to get it right. It is
just that adding complete static analysis for an arch is daunting and
time consuming. How would we ever prove to the community that we are
truly done?

I am attempting to define a way where we can say - if these conditions are
met, then the stack is reliable. Else, it is unreliable without having to
analyze all the different ways it can be unreliable. Easier said than
done!

> Also, a simpler version of objtool isn't really an option on the x86
> side, since we now have a lot of other features relying on its full
> coverage.  Other than the decoder, most of the objtool logic is
> arch-independent.
> 

ok.

Thanks for all the info. Appreciate it.

Madhavan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel