Re: [PATCH -next] powerpc: add support for syscall stack randomization

From: Kees Cook <keescook@chromium.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: benh@kernel.crashing.org, christophe.leroy@csgroup.eu,
	mark.rutland@arm.com, mpe@ellerman.id.au, paulus@samba.org,
	tglx@linutronix.de, Xiu Jianfeng <xiujianfeng@huawei.com>,
	linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH -next] powerpc: add support for syscall stack randomization
Date: Tue, 10 May 2022 09:19:56 -0700	[thread overview]
Message-ID: <202205100917.5480D91@keescook> (raw)
In-Reply-To: <1652173338.7bltwybi0c.astroid@bobo.none>

On Tue, May 10, 2022 at 07:23:46PM +1000, Nicholas Piggin wrote:
> Excerpts from Xiu Jianfeng's message of May 5, 2022 9:19 pm:
> > Add support for adding a random offset to the stack while handling
> > syscalls. This patch uses mftb() instead of get_random_int() for better
> > performance.
> 
> Hey, very nice.

Agreed! :)

> > [...]
> > @@ -82,6 +83,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >  
> >  	kuap_lock();
> >  
> > +	add_random_kstack_offset();
> >  	regs->orig_gpr3 = r3;
> >  
> >  	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> 
> This looks like the right place. I wonder why other interrupts don't
> get the same treatment. Userspace can induce the kernel to take a 
> synchronous interrupt, or wait for async ones. Smaller surface area 
> maybe but certain instruction emulation for example could result in
> significant logic that depends on user state. Anyway that's for
> hardening gurus to ponder.

I welcome it being used for any userspace controllable entry to the
kernel! :)

Also, related, have you validated the result using the LKDTM test?
See tools/testing/selftests/lkdtm/stack-entropy.sh

> 
> > @@ -405,6 +407,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs)
> >  
> >  	/* Restore user access locks last */
> >  	kuap_user_restore(regs);
> > +	choose_random_kstack_offset(mftb() & 0xFF);
> >  
> >  	return ret;
> >  }
> 
> So this seems to be what x86 and s390 do, but why are we choosing a
> new offset for every interrupt when it's only used on a syscall?
> I would rather you do what arm64 does and just choose the offset
> at the end of system_call_exception.
> 
> I wonder why the choose is separated from the add? I guess it's to
> avoid a data dependency for stack access on an expensive random
> function, so that makes sense (a comment would be nice in the
> generic code).

How does this read? I can send a "real" patch if it looks good:

diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
index 1468caf001c0..ad3e80275c74 100644
--- a/include/linux/randomize_kstack.h
+++ b/include/linux/randomize_kstack.h
@@ -40,8 +40,11 @@ DECLARE_PER_CPU(u32, kstack_offset);
  */
 #define KSTACK_OFFSET_MAX(x)	((x) & 0x3FF)
 
-/*
- * These macros must be used during syscall entry when interrupts and
+/**
+ * add_random_kstack_offset - Increase stack utilization by previously
+ *			      chosen random offset
+ *
+ * This should be used in the syscall entry path when interrupts and
  * preempt are disabled, and after user registers have been stored to
  * the stack.
  */
@@ -55,6 +58,24 @@ DECLARE_PER_CPU(u32, kstack_offset);
 	}								\
 } while (0)
 
+/**
+ * choose_random_kstack_offset - Choose the random offsset for the next
+ *				 add_random_kstack_offset()
+ *
+ * This should only be used during syscall exit when interrupts and
+ * preempt are disabled, and before user registers have been restored
+ * from the stack. This is done to frustrate attack attempts from
+ * userspace to learn the offset:
+ * - Maximize the timing uncertainty visible from userspace: if the
+ *   the offset is chosen at syscall entry, userspace has much more
+ *   control over the timing between chosen offsets. "How long will we
+ *   be in kernel mode?" tends to be more difficult to know than "how
+ *   long will be be in user mode?"
+ * - Reduce the lifetime of the new offset sitting in memory during
+ *   kernel mode execution. Exposures of "thread-local" (e.g. current,
+ *   percpu, etc) memory contents tends to be easier than arbitrary
+ *   location memory exposures.
+ */
 #define choose_random_kstack_offset(rand) do {				\
 	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
 				&randomize_kstack_offset)) {		\


-- 
Kees Cook