Re: [PATCH] x86-32: fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)

From: ebiederm@xmission.com (Eric W. Biederman)
To: hpa@zytor.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Alexandru Chirvasitu <achirvasub@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andy Lutomirski <luto@kernel.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Borislav Petkov <bp@alien8.de>, Brian Gerst <brgerst@gmail.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH] x86-32: fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
Date: Thu, 28 Dec 2017 18:30:21 -0600	[thread overview]
Message-ID: <87efne4bqa.fsf@xmission.com> (raw)
In-Reply-To: <2B58A32E-73ED-4D1C-9C12-28941B321FE4@zytor.com> (hpa@zytor.com's message of "Thu, 28 Dec 2017 15:23:32 -0800")

hpa@zytor.com writes:

> On December 28, 2017 2:47:47 PM PST, ebiederm@xmission.com wrote:
>>Linus Torvalds <torvalds@linux-foundation.org> writes:
>>
>>> From: Linus Torvalds <torvalds@linux-foundation.org>
>>> Date: Wed, 27 Dec 2017 11:41:30 -0800
>>> Subject: [PATCH] x86-32: fix kexec with stack canary
>>(CONFIG_CC_STACKPROTECTOR)
>>>
>>> Commit e802a51ede91 ("x86/idt: Consolidate IDT invalidation") cleaned
>>up
>>> and unified the IDT invalidation that existed in a couple of places. 
>>It
>>> changed no actual real code.
>>>
>>> Despite not changing any actual real code, it _did_ change code
>>> generation: by implementing the common idt_invalidate() function in
>>> archx86/kernel/idt.c, it made the use of the function in
>>> arch/x86/kernel/machine_kexec_32.c be a real function call rather
>>than
>>> an (accidental) inlining of the function.
>>>
>>> That, in turn, exposed two issues:
>>>
>>>  - in load_segments(), we had incorrectly reset all the segment
>>>    registers, which then made the stack canary load (which gcc does
>>>    using offset of %gs) cause a trap.  Instead of %gs pointing to the
>>>    stack canary, it will be the normal zero-based kernel segment, and
>>>    the stack canary load will take a page fault at address 0x14.
>>>
>>>  - to make this even harder to debug, we had invalidated the GDT just
>>>    before calling idt_invalidate(), which meant that the fault
>>happened
>>>    with an invalid GDT, which in turn causes a triple fault and
>>>    immediate reboot.
>>>
>>> Fix this by
>>>
>>>  (a) not reloading the special segments in load_segments(). We
>>currently
>>>      don't do any percpu accesses (which would require %fs on x86-32)
>>in
>>>      this area, but there's no reason to think that we might not want
>>to
>>>      do them, and like %gs, it's pointless to break it.
>>>
>>>  (b) doing idt_invalidate() before invalidating the GDT, to keep
>>things
>>>      at least _slightly_ more debuggable for a bit longer. Without a
>>>      IDT, traps will not work. Without a GDT, traps also will not
>>work,
>>>      but neither will any segment loads etc. So in a very real sense,
>>>      the GDT is even more core than the IDT.
>>>
>>> Reported-and-tested-by: Alexandru Chirvasitu <achirvasub@gmail.com>
>>> Fixes: e802a51ede91 ("x86/idt: Consolidate IDT invalidation")
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Andy Lutomirski <luto@kernel.org>
>>> Cc: Peter Anvin <hpa@zytor.com>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>> ---
>>>
>>> I wrote "Reported-and-tested-by: Alexandru" because while this isn't 
>>> exactly the same patch as anything Alexandru tested, it's pretty
>>close, 
>>> and I'm pretty sure this version will fix his issues too.
>>>
>>> I decided to try to just do the minimal changes: the GDT invalidation
>>last 
>>> (because of the debugging) and _only_ removing the resetting of fs/gs
>>
>>> rather than removing load_segments() entirely.
>>>
>>> I think making idt_invalidate() be inline would be a good thing as
>>well, 
>>> and I do think that all those "phys_to_virt(0)" things are garbage,
>>but I 
>>> also think they are independent issues, so I didn't touch any of
>>that. 
>>>
>>> I'm assuming I'll get this patch back through the x86 tree, and will
>>not 
>>> be applying it to my own git tree unless the x86 people ask me to.
>>>
>>> Comments?
>>
>>There is one significant problem with this patch.  It changes the ABI
>>that kexec provides to the next kernel.
>>
>>That ABI is that the segments will be set to a well defined value.
>>That value is flat 32bit segments with a base address of 0.
>>
>>By removing %fs and %gs from load_segments they are now effectively
>>random undefined values, to the next kernel.
>>
>>I don't know if anything actually cares.  But if they do they are now
>>broken.  It is easy enough to preserve that invariant I don't see
>>a point in risking potential breaking and looking to see if we have
>>actually broken the ABI.
>>
>>It feels like this is something we should move into assembly rather
>>than attempting to cater to the changing evironment of C code in the
>>kernel.  Or if not we need a big fat comment be very very careful
>>this code is special.
>>
>>Eric
>>
>>
>>>  arch/x86/kernel/machine_kexec_32.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/machine_kexec_32.c
>>b/arch/x86/kernel/machine_kexec_32.c
>>> index 00bc751c861c..edfede768688 100644
>>> --- a/arch/x86/kernel/machine_kexec_32.c
>>> +++ b/arch/x86/kernel/machine_kexec_32.c
>>> @@ -48,8 +48,6 @@ static void load_segments(void)
>>>  		"\tmovl $"STR(__KERNEL_DS)",%%eax\n"
>>>  		"\tmovl %%eax,%%ds\n"
>>>  		"\tmovl %%eax,%%es\n"
>>> -		"\tmovl %%eax,%%fs\n"
>>> -		"\tmovl %%eax,%%gs\n"
>>>  		"\tmovl %%eax,%%ss\n"
>>>  		: : : "eax", "memory");
>>>  #undef STR
>>> @@ -232,8 +230,8 @@ void machine_kexec(struct kimage *image)
>>>  	 * The gdt & idt are now invalid.
>>>  	 * If you want to load them you must set up your own idt & gdt.
>>>  	 */
>>> -	set_gdt(phys_to_virt(0), 0);
>>>  	idt_invalidate(phys_to_virt(0));
>>> +	set_gdt(phys_to_virt(0), 0);
>>>  
>>>  	/* now call it */
>>>  	image->start = relocate_kernel_ptr((unsigned long)image->head,
>
> The ABI the kernel requires on entry is also documented, and we should
> stick to that.

Wrong interface this, does not transfer directly to a linux kernel.  This
transfers to a shim that starts a linux kernel or something else.

It is way past time to be having design discussions about what this
interface should do.  It is more than a decade old.

> That being said, the bottom line is to just stop putting these kinds
> of final handovers into C and just hope the compiler (or
> tracing/debugging developers) doesn't randomly break at some thing.

In general I agree.  That makes the code a little less approachable, but
it would seem to remove the chance of surprise interactions even more.

Eric