From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756293AbcHVUiO (ORCPT ); Mon, 22 Aug 2016 16:38:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45854 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756069AbcHVUiN (ORCPT ); Mon, 22 Aug 2016 16:38:13 -0400 Date: Mon, 22 Aug 2016 15:27:19 -0500 From: Josh Poimboeuf To: Kees Cook Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , "x86@kernel.org" , LKML , Andy Lutomirski , Linus Torvalds , Steven Rostedt , Brian Gerst , Peter Zijlstra , Frederic Weisbecker , Byungchul Park , Nilay Vaish Subject: Re: [PATCH v4 54/57] x86/mm: convert arch_within_stack_frames() to use the new unwinder Message-ID: <20160822202719.gi2qwjvpakesdzop@treble> References: <62fab36288792edae0181274641d6b4c62157fea.1471525031.git.jpoimboe@redhat.com> <20160819215522.ofav5ifdn7i5taxm@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160819215522.ofav5ifdn7i5taxm@treble> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 22 Aug 2016 20:27:22 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 19, 2016 at 04:55:22PM -0500, Josh Poimboeuf wrote: > On Fri, Aug 19, 2016 at 11:27:18AM -0700, Kees Cook wrote: > > On Thu, Aug 18, 2016 at 6:06 AM, Josh Poimboeuf wrote: > > > Convert arch_within_stack_frames() to use the new unwinder. > > > > > > This also changes some existing behavior: > > > > > > - Skip checking of pt_regs frames. > > > - Warn if it can't reach the grandparent's stack frame. > > > - Warn if it doesn't unwind to the end of the stack. > > > > > > Signed-off-by: Josh Poimboeuf > > > > All the stuff touching usercopy looks good to me. One question, > > though, in looking through the unwinder. It seems like it's much more > > complex than just the frame-hopping that the old > > arch_within_stack_frames() did, but I'm curious to hear what you think > > about its performance. We'll be calling this with every usercopy that > > touches the stack, so I'd like to be able to estimate the performance > > impact of this replacement... > > Yeah, good point. I'll take some measurements from before and after and > get back to you. I took some before/after measurements by enclosing the affected functions with ktime calls to get the total time spent in each function, and did a "find /usr >/dev/null" to trigger a bunch of user copies. copy_to/from_user check_object_size arch_within_stack_frames before: 13ms 6.8ms 0.61ms after: 17ms 11ms 4.6ms The unwinder port made arch_within_stack_frames() *much* (8x) slower than its current simple implementation, and added about 30% (4ms) to the total copy_to/from_user() run time. Note that hardened usercopy itself is already quite slow: it made user copies about 52% slower. With the unwinder port, that worsened to ~65%. "find /usr" took about 170ms of kernel time and 2.3s total. So the unwinder port added about 2% on the kernel side and 0.2% total for this particular test case. Though I'm sure there are more I/O-intensive workloads out there which would be more adversely affected. I haven't yet looked to see where the bottlenecks are and if there could be any obvious performance improvements. BTW, ignoring the performance issues, using the unwinder here would have some benefits: - It protects pt_regs frames from being changed. For example, during a page fault operation, the saved regs->ip on the stack is protected. - Unlike the existing code, it could potentially work with __copy_from_user_inatomic() and copy_from_user_nmi(), which can copy to/from an irq/exception stack. (I think check_stack_object() would need to be rewritten a bit so that it doesn't always assume the task stack.) - It complains loudly if there's stack corruption or something else goes wrong with walking the stack instead of just silently failing. - The same code could also work with DWARF if we ever add a DWARF unwinder (with a possible tweak to the unwinder API to get the stack frame header size). -- Josh