From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35ACAC433EF for ; Wed, 27 Apr 2022 17:32:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244310AbiD0RfQ (ORCPT ); Wed, 27 Apr 2022 13:35:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244266AbiD0RfA (ORCPT ); Wed, 27 Apr 2022 13:35:00 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E73551FD4CD for ; Wed, 27 Apr 2022 10:31:46 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A00F8ED1; Wed, 27 Apr 2022 10:31:46 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 6BD133F73B; Wed, 27 Apr 2022 10:31:45 -0700 (PDT) From: Mark Rutland To: linux-arm-kernel@lists.infradead.org Cc: akpm@linux-foundation.org, alex.popov@linux.com, catalin.marinas@arm.com, keescook@chromium.org, linux-kernel@vger.kernel.org, luto@kernel.org, mark.rutland@arm.com, will@kernel.org Subject: [PATCH v2 06/13] stackleak: rework stack high bound handling Date: Wed, 27 Apr 2022 18:31:21 +0100 Message-Id: <20220427173128.2603085-7-mark.rutland@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220427173128.2603085-1-mark.rutland@arm.com> References: <20220427173128.2603085-1-mark.rutland@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Prior to returning to userpace, we reset current->lowest_stack to a reasonable high bound. Currently we do this by subtracting the arbitrary value `THREAD_SIZE/64` from the top of the stack, for reasons lost to history. Looking at configurations today: * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of padding above), and so this covers an additional portion of 44 to 60 bytes. * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs at the top of the stack is 168 bytes, and so this cover an additional 88 bytes of stack (up to 344 with KASAN). * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with KASAN). The pt_regs at the top of the stack is 336 bytes, so this can fall within the pt_regs, or can cover an additional 688 bytes of stack. Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the worst case, this will cause more than 600 bytes of stack to be erased for every syscall, even if actual stack usage were substantially smaller. This patches makes this slightly less nonsensical by consistently resetting current->lowest_stack to the base of the task pt_regs. For clarity and for consistency with the handling of the low bound, the generation of the high bound is split into a helper with commentary explaining why. Since the pt_regs at the top of the stack will be clobbered upon the next exception entry, we don't need to poison these at exception exit. By using task_pt_regs() as the high stack boundary instead of current_top_of_stack() we avoid some redundant poisoning, and the compiler can share the address generation between the poisoning and restting of `current->lowest_stack`, making the generated code more optimal. It's not clear to me whether the existing `THREAD_SIZE/64` offset was a dodgy heuristic to skip the pt_regs, or whether it was attempting to minimize the number of times stackleak_check_stack() would have to update `current->lowest_stack` when stack usage was shallow at the cost of unconditionally poisoning a small portion of the stack for every exit to userspace. For now I've simply removed the offset, and if we need/want to minimize updates for shallow stack usage it should be easy to add a better heuristic atop, with appropriate commentary so we know what's going on. Signed-off-by: Mark Rutland Cc: Alexander Popov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Kees Cook --- include/linux/stackleak.h | 14 ++++++++++++++ kernel/stackleak.c | 19 ++++++++++++++----- 2 files changed, 28 insertions(+), 5 deletions(-) diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h index 67430faa5c518..467661aeb4136 100644 --- a/include/linux/stackleak.h +++ b/include/linux/stackleak.h @@ -28,6 +28,20 @@ stackleak_task_low_bound(const struct task_struct *tsk) return (unsigned long)end_of_stack(tsk) + sizeof(unsigned long); } +/* + * The address immediately after the highest address on tsk's stack which we + * can plausibly erase. + */ +static __always_inline unsigned long +stackleak_task_high_bound(const struct task_struct *tsk) +{ + /* + * The task's pt_regs lives at the top of the task stack and will be + * overwritten by exception entry, so there's no need to erase them. + */ + return (unsigned long)task_pt_regs(tsk); +} + static inline void stackleak_task_init(struct task_struct *t) { t->lowest_stack = stackleak_task_low_bound(t); diff --git a/kernel/stackleak.c b/kernel/stackleak.c index d5f684dc0a2d9..ba346d46218f5 100644 --- a/kernel/stackleak.c +++ b/kernel/stackleak.c @@ -73,6 +73,7 @@ late_initcall(stackleak_sysctls_init); static __always_inline void __stackleak_erase(void) { const unsigned long task_stack_low = stackleak_task_low_bound(current); + const unsigned long task_stack_high = stackleak_task_high_bound(current); unsigned long erase_low = current->lowest_stack; unsigned long erase_high; unsigned int poison_count = 0; @@ -93,14 +94,22 @@ static __always_inline void __stackleak_erase(void) #endif /* - * Now write the poison value to the kernel stack between 'erase_low' - * and 'erase_high'. We assume that the stack pointer doesn't change - * when we write poison. + * Write poison to the task's stack between 'erase_low' and + * 'erase_high'. + * + * If we're running on a different stack (e.g. an entry trampoline + * stack) we can erase everything below the pt_regs at the top of the + * task stack. + * + * If we're running on the task stack itself, we must not clobber any + * stack used by this function and its caller. We assume that this + * function has a fixed-size stack frame, and the current stack pointer + * doesn't change while we write poison. */ if (on_thread_stack()) erase_high = current_stack_pointer; else - erase_high = current_top_of_stack(); + erase_high = task_stack_high; while (erase_low < erase_high) { *(unsigned long *)erase_low = STACKLEAK_POISON; @@ -108,7 +117,7 @@ static __always_inline void __stackleak_erase(void) } /* Reset the 'lowest_stack' value for the next syscall */ - current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64; + current->lowest_stack = task_stack_high; } asmlinkage void noinstr stackleak_erase(void) -- 2.30.2 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D7886C433EF for ; Wed, 27 Apr 2022 17:44:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=u7Vi+T6fiIVjxnP5+UvlOdtqcLTM5cTPp06mDyR1N38=; b=cT04h38s9a4iRE qpFJLsThm7lBSRtXirUMpiCAe0C5Oe0pLFWhCI5+s7dH6yA6re4O44Kw+KBuESgNh6wV31EhZJV6c 6hSW/bCzkiHe0hnaKJjGerXrZ0Zb73DtcsF89ZWkY1rPXAxuSTwW3nvYauel9vlYEkvy18FKPI+Qd dUQ2s82tcTEwGt5Gv6Q/aTO3caqUiNx3NP6zY9HnNli3sBJzPw4j9ZASLxq/ehgAlnHdoRA/y0fDZ xo3FsPGYuTjMjJ8RBcLX1NkjeIJfzeIvi7UnwvtmlDPWw1UM1wblYZSRkhSLhMIwuK1qXxy8Ya3wh iIgELStcnwOamz14Ting==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1njlhC-002kg8-US; Wed, 27 Apr 2022 17:43:23 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1njlVz-002gX4-78 for linux-arm-kernel@lists.infradead.org; Wed, 27 Apr 2022 17:31:49 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A00F8ED1; Wed, 27 Apr 2022 10:31:46 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 6BD133F73B; Wed, 27 Apr 2022 10:31:45 -0700 (PDT) From: Mark Rutland To: linux-arm-kernel@lists.infradead.org Cc: akpm@linux-foundation.org, alex.popov@linux.com, catalin.marinas@arm.com, keescook@chromium.org, linux-kernel@vger.kernel.org, luto@kernel.org, mark.rutland@arm.com, will@kernel.org Subject: [PATCH v2 06/13] stackleak: rework stack high bound handling Date: Wed, 27 Apr 2022 18:31:21 +0100 Message-Id: <20220427173128.2603085-7-mark.rutland@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220427173128.2603085-1-mark.rutland@arm.com> References: <20220427173128.2603085-1-mark.rutland@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220427_103147_376667_B687D7F1 X-CRM114-Status: GOOD ( 23.44 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Prior to returning to userpace, we reset current->lowest_stack to a reasonable high bound. Currently we do this by subtracting the arbitrary value `THREAD_SIZE/64` from the top of the stack, for reasons lost to history. Looking at configurations today: * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of padding above), and so this covers an additional portion of 44 to 60 bytes. * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs at the top of the stack is 168 bytes, and so this cover an additional 88 bytes of stack (up to 344 with KASAN). * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with KASAN). The pt_regs at the top of the stack is 336 bytes, so this can fall within the pt_regs, or can cover an additional 688 bytes of stack. Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the worst case, this will cause more than 600 bytes of stack to be erased for every syscall, even if actual stack usage were substantially smaller. This patches makes this slightly less nonsensical by consistently resetting current->lowest_stack to the base of the task pt_regs. For clarity and for consistency with the handling of the low bound, the generation of the high bound is split into a helper with commentary explaining why. Since the pt_regs at the top of the stack will be clobbered upon the next exception entry, we don't need to poison these at exception exit. By using task_pt_regs() as the high stack boundary instead of current_top_of_stack() we avoid some redundant poisoning, and the compiler can share the address generation between the poisoning and restting of `current->lowest_stack`, making the generated code more optimal. It's not clear to me whether the existing `THREAD_SIZE/64` offset was a dodgy heuristic to skip the pt_regs, or whether it was attempting to minimize the number of times stackleak_check_stack() would have to update `current->lowest_stack` when stack usage was shallow at the cost of unconditionally poisoning a small portion of the stack for every exit to userspace. For now I've simply removed the offset, and if we need/want to minimize updates for shallow stack usage it should be easy to add a better heuristic atop, with appropriate commentary so we know what's going on. Signed-off-by: Mark Rutland Cc: Alexander Popov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Kees Cook --- include/linux/stackleak.h | 14 ++++++++++++++ kernel/stackleak.c | 19 ++++++++++++++----- 2 files changed, 28 insertions(+), 5 deletions(-) diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h index 67430faa5c518..467661aeb4136 100644 --- a/include/linux/stackleak.h +++ b/include/linux/stackleak.h @@ -28,6 +28,20 @@ stackleak_task_low_bound(const struct task_struct *tsk) return (unsigned long)end_of_stack(tsk) + sizeof(unsigned long); } +/* + * The address immediately after the highest address on tsk's stack which we + * can plausibly erase. + */ +static __always_inline unsigned long +stackleak_task_high_bound(const struct task_struct *tsk) +{ + /* + * The task's pt_regs lives at the top of the task stack and will be + * overwritten by exception entry, so there's no need to erase them. + */ + return (unsigned long)task_pt_regs(tsk); +} + static inline void stackleak_task_init(struct task_struct *t) { t->lowest_stack = stackleak_task_low_bound(t); diff --git a/kernel/stackleak.c b/kernel/stackleak.c index d5f684dc0a2d9..ba346d46218f5 100644 --- a/kernel/stackleak.c +++ b/kernel/stackleak.c @@ -73,6 +73,7 @@ late_initcall(stackleak_sysctls_init); static __always_inline void __stackleak_erase(void) { const unsigned long task_stack_low = stackleak_task_low_bound(current); + const unsigned long task_stack_high = stackleak_task_high_bound(current); unsigned long erase_low = current->lowest_stack; unsigned long erase_high; unsigned int poison_count = 0; @@ -93,14 +94,22 @@ static __always_inline void __stackleak_erase(void) #endif /* - * Now write the poison value to the kernel stack between 'erase_low' - * and 'erase_high'. We assume that the stack pointer doesn't change - * when we write poison. + * Write poison to the task's stack between 'erase_low' and + * 'erase_high'. + * + * If we're running on a different stack (e.g. an entry trampoline + * stack) we can erase everything below the pt_regs at the top of the + * task stack. + * + * If we're running on the task stack itself, we must not clobber any + * stack used by this function and its caller. We assume that this + * function has a fixed-size stack frame, and the current stack pointer + * doesn't change while we write poison. */ if (on_thread_stack()) erase_high = current_stack_pointer; else - erase_high = current_top_of_stack(); + erase_high = task_stack_high; while (erase_low < erase_high) { *(unsigned long *)erase_low = STACKLEAK_POISON; @@ -108,7 +117,7 @@ static __always_inline void __stackleak_erase(void) } /* Reset the 'lowest_stack' value for the next syscall */ - current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64; + current->lowest_stack = task_stack_high; } asmlinkage void noinstr stackleak_erase(void) -- 2.30.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel