From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8420C433EF for ; Mon, 9 May 2022 11:22:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232748AbiEIL0e (ORCPT ); Mon, 9 May 2022 07:26:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232417AbiEIL0U (ORCPT ); Mon, 9 May 2022 07:26:20 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 018EC2528A; Mon, 9 May 2022 04:22:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=I5w1l7kr42Qgl1B7paAXQrUp55JqRTIxrpC92Io5gcY=; b=LxIUWFwus+tGPEynRScRh9gh6Y X64jWNjxl8Rt/cXVIAgJZhVY7crrfM4xLN7KDgzNu4d+wAzPhGOYUS1RWkm/+Ru6pPOmp3D+bASwq KcpyRNECMOiaLhUWIXy/NGttEEIWsoZrNYkMsqN/JG9jBDDkN80LIVOytHIwMHTgKWT1RW3FBXJm2 B/VGxApqBfe/9R5adpqBf17LKdW4zbp18Hxmv5cgdRxw8nixjx/64X/lk+ry/Y/d6BeJuije5qhX9 w1Z3hWajT/AuogLRkzyB47Y5HjbYB4p+2vEFWsZf4jWJvyQVvfkcQyqRoBXkhC3E7Ekyv+9yw0Ic4 ZkDjCY1g==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1no1Sk-00CXFw-VS; Mon, 09 May 2022 11:22:03 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 6D241300385; Mon, 9 May 2022 13:22:00 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 58C562026968A; Mon, 9 May 2022 13:22:00 +0200 (CEST) Date: Mon, 9 May 2022 13:22:00 +0200 From: Peter Zijlstra To: Kees Cook Cc: Peter Collingbourne , Josh Poimboeuf , Joao Moreira , linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, andrew.cooper3@citrix.com, samitolvanen@google.com, mark.rutland@arm.com, hjl.tools@gmail.com, alyssa.milburn@linux.intel.com, ndesaulniers@google.com, gabriel.gomes@linux.intel.com, rick.p.edgecombe@intel.com Subject: Re: [RFC PATCH 01/11] x86: kernel FineIBT Message-ID: References: <20220420004241.2093-1-joao@overdrivepizza.com> <20220420004241.2093-2-joao@overdrivepizza.com> <20220429013704.4n4lmadpstdioe7a@treble> <20220503220244.vyz5flk3gg3y6rbw@treble> <202205080033.82AB3703C3@keescook> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <202205080033.82AB3703C3@keescook> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, May 08, 2022 at 01:29:13AM -0700, Kees Cook wrote: > On Wed, May 04, 2022 at 08:16:57PM +0200, Peter Zijlstra wrote: > > FineIBT kCFI > > > > __fineibt_\hash: > > xor \hash, %r10 # 7 > > jz 1f # 2 > > ud2 # 2 > > 1: ret # 1 > > int3 # 1 > > > > > > __cfi_\sym: __cfi_\sym: > > int3; int3 # 2 > > endbr # 4 mov \hash, %eax # 5 > > call __fineibt_\hash # 5 int3; int3 # 2 > > \sym: \sym: > > ... ... > > > > > > caller: caller: > > movl \hash, %r10d # 6 cmpl \hash, -6(%r11) # 8 > > sub $9, %r11 # 4 je 1f # 2 > > call *%r11 # 3 ud2 # 2 > > .nop 4 # 4 (or fixup r11) call __x86_indirect_thunk_r11 # 5 > > This looks good! > > And just to double-check my understanding here... \sym is expected to > start with endbr with IBT + kCFI? Ah, the thinking was that 'if IBT then FineIBT', so the combination of kCFI and IBT is of no concern. And since FineIBT will have the ENDBR in the __cfi_\sym thing, \sym will not need it. But thinking about this now I suppose __nocfi call symbols will stlil need the ENDBR on. Objtool IBT validation would need to either find ENDBR or a matching __cfi_\sym I suppose. So I was talking to Joao on IRC the other day, and I realized that if kCFI generates code as per the above, then we can do FineIBT purely in-kernel. That is; have objtool generate a section of __cfi_\sym locations. Then use the .retpoline_sites and .cfi_sites to rewrite kCFI into the FineIBT form in multi pass: - read all the __cfi_\sym sites and collect all unique hash values - allocate (module) memory and write __fineibt_\hash functions for each unique hash value found - rewrite callers; nop out kCFI - rewrite all __cfi_\sym - rewrite all callers - enable IBT And the same on module load I suppose. But I've only thought about this, not actually implemented it, so who knows what surprises are lurking there :-) > Random extra thoughts... feel free to ignore. :) Given that both CFI > schemes depend on an attacker not being able to construct an executable > memory region that either starts with endbr (for FineIBT) or starts with > hash & 2 bytes (for kCFI), we should likely take another look at where > the kernel uses PAGE_KERNEL_EXEC. > > It seems non-specialized use is entirely done via module_alloc(). Obviously > modules need to stay as-is. So we're left with other module_alloc() > callers: BPF JIT, ftrace, and kprobes. > > Perhaps enabling CFI should tie bpf_jit_harden (which performs constant > blinding) to the value of bpf_jit_enable? (i.e. either use BPF VM which > reads from non-exec memory, or use BPF JIT with constant blinding.) > > I *think* all the kprobes and ftrace stuff ends up using constructed > direct calls, though, yes? So if we did bounds checking, we could > "exclude" them as well as the BPF JIT. Though I'm not sure how > controllable the content written to the kprobes and ftrace regions are, > though? Both ftrace and kprobe only write fairly simple tramplines based off of a template. Neither has indirect calls. > For exclusion, we could separate actual modules from the other > module_alloc() users by maybe allocating in opposite directions from the > randomized offset and check indirect calls against the kernel text bounds > and the new modules-only bounds. Sounds expensive, though. Maybe PKS, > but I can't imagine 2 MSR writes per indirect call would be fast. Hmm... I'm not sure what problem you're trying to solve..