From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751628AbcGMShH (ORCPT ); Wed, 13 Jul 2016 14:37:07 -0400 Received: from mail-vk0-f46.google.com ([209.85.213.46]:35299 "EHLO mail-vk0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750872AbcGMShA convert rfc822-to-8bit (ORCPT ); Wed, 13 Jul 2016 14:37:00 -0400 MIME-Version: 1.0 In-Reply-To: <578601B3.3050903@de.ibm.com> References: <578601B3.3050903@de.ibm.com> From: Andy Lutomirski Date: Wed, 13 Jul 2016 11:36:28 -0700 Message-ID: Subject: Re: [PATCH v5 00/32] virtually mapped stacks and thread_info cleanup To: Christian Borntraeger Cc: Andy Lutomirski , X86 ML , "linux-kernel@vger.kernel.org" , linux-arch , Borislav Petkov , Nadav Amit , Kees Cook , Brian Gerst , "kernel-hardening@lists.openwall.com" , Linus Torvalds , Josh Poimboeuf , Jann Horn , Heiko Carstens , linux-s390 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 13, 2016 at 1:54 AM, Christian Borntraeger wrote: > On 07/11/2016 10:53 PM, Andy Lutomirski wrote: >> Hi all- >> >> Since the dawn of time, a kernel stack overflow has been a real PITA >> to debug, has caused nondeterministic crashes some time after the >> actual overflow, and has generally been easy to exploit for root. >> >> With this series, arches can enable HAVE_ARCH_VMAP_STACK. Arches >> that enable it (just x86 for now) get virtually mapped stacks with >> guard pages. This causes reliable faults when the stack overflows. >> >> If the arch implements it well, we get a nice OOPS on stack overflow >> (as opposed to panicing directly or otherwise exploding badly). On >> x86, the OOPS is nice, has a usable call trace, and the overflowing >> task is killed cleanly. >> >> This series (starting with v4) also extensively cleans up >> thread_info. thread_info has been partially redundant with >> thread_struct for a long time -- both are places for arch code to >> add additional per-task variables. thread_struct is much cleaner: >> it's always in task_struct, and there's nothing particularly magical >> about it. So this series contains a bunch of cleanups on x86 to >> move almost everything from thread_info to thread_struct (which, >> even by itself, deletes more code than it adds) and to remove x86's >> dependence on thread_info's position on the stack. Then it opts x86 >> into a new config option THREAD_INFO_IN_TASK to get rid of >> arch-specific thread_info entirely and simply embed a defanged >> thread_info (containing only flags) and 'int cpu' into task_struct. >> >> Once thread_info stops being magical, there's another benefit: we >> can free the thread stack as soon as the task is dead (without >> waiting for RCU) and then, if vmapped stacks are in use, cache the >> entire stack for reuse on the same cpu. >> >> This seems to be an overall speedup of about 0.5-1 µs per >> pthread_create/join in a simple test -- a percpu cache of vmalloced >> stacks appears to be a bit faster than a high-order stack >> allocation, at least when the cache hits. (I expect that workloads >> with a low cache hit rate are likely to be dominated by other >> effects anyway.) >> >> This does not address interrupt stacks. >> >> It's worth noting that s390 has an arch-specific gcc feature that >> detects stack overflows by adjusting function prologues. Arches >> with features like that may wish to avoid using vmapped stacks to >> minimize the performance hit. > > Yes, might not need this for stack overflow detection. What might > be interesting is the thread_info/thread_struct change, if we can > strip down thread_info.(CONFIG_THREAD_INFO_IN_TASK). Would it actually > make sense to separate these two changes to see what performance > impact CONFIG_THREAD_INFO_IN_TASK has on its own? > They're already separated. CONFIG_THREAD_INFO_IN_TASK should have basically no performance impact unless there are arch-dependent (percpu?) issues involved. It does enable immediate thread stack deallocation, though, and it would be straightforward to make CONFIG_THREAD_INFO_IN_TASK cache stacks even if CONFIG_VMAP_STACK=n. That should be a moderate clone() speedup. -- Andy Lutomirski AMA Capital Management, LLC