From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754316AbbDJLOf (ORCPT ); Fri, 10 Apr 2015 07:14:35 -0400 Received: from mail-wg0-f43.google.com ([74.125.82.43]:34616 "EHLO mail-wg0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752288AbbDJLOc (ORCPT ); Fri, 10 Apr 2015 07:14:32 -0400 Date: Fri, 10 Apr 2015 13:14:27 +0200 From: Ingo Molnar To: "Paul E. McKenney" Cc: Linus Torvalds , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML Subject: [PATCH] x86/uaccess: Implement get_kernel() Message-ID: <20150410111427.GA30477@gmail.com> References: <1428561611.3506.78.camel@j-VirtualBox> <20150409075311.GA4645@gmail.com> <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150410092152.GA21332@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > * Ingo Molnar wrote: > > > Yeah, so what I missed here are those nops: placeholders for the > > STAC/CLAC instructions on x86... and this is what Linus mentioned > > about the clac() overhead. > > > > But this could be solved I think: by adding a > > copy_from_kernel_inatomic() primitive which simply leaves out the > > STAC/CLAC sequence: as these are always guaranteed to be kernel > > addresses, the SMAP fault should not be generated. > > So the first step would be to introduce a generic > __copy_from_kernel_inatomic() primitive as attached below. > > The next patch will implement efficient __copy_from_kernel_inatomic() > for x86. The patch below does that. Note, for simplicity I've changed the interface to 'get_kernel()' (will propagate this through the other patches as well). Minimally boot tested. owner-spinning looks sane now: 0000000000000030 : 30: 48 3b 37 cmp (%rdi),%rsi 33: 55 push %rbp 34: 48 89 e5 mov %rsp,%rbp 37: 75 4f jne 88 39: 48 8d 56 28 lea 0x28(%rsi),%rdx 3d: 8b 46 28 mov 0x28(%rsi),%eax 40: 85 c0 test %eax,%eax 42: 74 3c je 80 44: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 4b: 00 00 4d: 48 8b 80 10 c0 ff ff mov -0x3ff0(%rax),%rax 54: a8 08 test $0x8,%al 56: 75 28 jne 80 58: 65 48 8b 0c 25 00 00 mov %gs:0x0,%rcx 5f: 00 00 61: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 68: f3 90 pause 6a: 48 3b 37 cmp (%rdi),%rsi 6d: 75 19 jne 88 6f: 8b 02 mov (%rdx),%eax 71: 85 c0 test %eax,%eax 73: 74 0b je 80 75: 48 8b 81 10 c0 ff ff mov -0x3ff0(%rcx),%rax 7c: a8 08 test $0x8,%al 7e: 74 e8 je 68 80: 31 c0 xor %eax,%eax 82: 5d pop %rbp 83: c3 retq 84: 0f 1f 40 00 nopl 0x0(%rax) 88: b8 01 00 00 00 mov $0x1,%eax 8d: 5d pop %rbp 8e: c3 retq 8f: 90 nop although the double unrolled need_resched() check looks silly: 4d: 48 8b 80 10 c0 ff ff mov -0x3ff0(%rax),%rax 54: a8 08 test $0x8,%al 75: 48 8b 81 10 c0 ff ff mov -0x3ff0(%rcx),%rax 7c: a8 08 test $0x8,%al Thanks, Ingo =============================> >>From 7f5917c7f9ee3598b353adbd3d0849ecf4485748 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Fri, 10 Apr 2015 13:01:39 +0200 Subject: [PATCH] x86/uaccess: Implement get_kernel() Implement an optimized get_kernel() primitive on x86: avoid the STAC/CLAC overhead. Only offer a 64-bit variant for now, 32-bit will fall back to get_user(). Not-Yet-Signed-off-by: Ingo Molnar --- arch/x86/include/asm/uaccess_64.h | 32 ++++++++++++++++++++++++++++++++ include/linux/uaccess.h | 8 ++------ kernel/locking/mutex.c | 3 +-- 3 files changed, 35 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index f2f9b39b274a..3ef75f0addac 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -207,6 +207,38 @@ __copy_from_user_inatomic(void *dst, const void __user *src, unsigned size) return __copy_from_user_nocheck(dst, src, size); } + +#define __get_kernel_asm_ex(dst, src, int_type, reg_type, lh_type) \ + asm volatile("1: mov"int_type" %1,%"reg_type"0\n" \ + "2:\n" \ + _ASM_EXTABLE_EX(1b, 2b) \ + : lh_type(dst) : "m" (__m(src))) + +extern void __get_kernel_BUILD_ERROR(void); + +/* + * Simple copy-from-possibly-faulting-kernel-addresses method that + * avoids the STAC/CLAC SMAP overhead. + * + * NOTE: this does not propagate the error code of faulting kernel + * addresses properly. You can recover it via uaccess_catch() + * if you really need to. + */ +#define get_kernel(dst, src) \ +do { \ + typeof(*(src)) __val; \ + \ + switch (sizeof(__val)) { \ + case 1: __get_kernel_asm_ex(__val, src, "b", "b", "=q"); break; \ + case 2: __get_kernel_asm_ex(__val, src, "w", "w", "=r"); break; \ + case 4: __get_kernel_asm_ex(__val, src, "l", "k", "=r"); break; \ + case 8: __get_kernel_asm_ex(__val, src, "q", " ", "=r"); break; \ + default: __get_kernel_BUILD_ERROR(); \ + } \ + (dst) = __val; \ +} while (0) +#define ARCH_HAS_GET_KERNEL + static __must_check __always_inline int __copy_to_user_inatomic(void __user *dst, const void *src, unsigned size) { diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 885eea43b69f..03a025b98d2e 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -111,12 +111,8 @@ extern long notrace __probe_kernel_write(void *dst, const void *src, size_t size * Generic wrapper, most architectures can just use __copy_from_user_inatomic() * to implement __copy_from_kernel_inatomic(): */ -#ifndef ARCH_HAS_COPY_FROM_KERNEL_INATOMIC -static __must_check __always_inline int -__copy_from_kernel_inatomic(void *dst, const void __user *src, unsigned size) -{ - return __copy_from_user_inatomic(dst, src, size); -} +#ifndef ARCH_HAS_GET_KERNEL +# define get_kernel(dst, src) get_user(dst, src) #endif #endif /* __LINUX_UACCESS_H__ */ diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index a4f74cda9fc4..68c750f4e8e8 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -226,7 +226,6 @@ static noinline bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner) { int on_cpu; - int ret; while (lock->owner == owner) { /* @@ -252,7 +251,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner) * NOTE2: We ignore failed copies, as the next iteration will clean * up after us. This saves an extra branch in the common case. */ - ret = __copy_from_kernel_inatomic(&on_cpu, &owner->on_cpu, sizeof(on_cpu)); + get_kernel(on_cpu, &owner->on_cpu); if (!on_cpu || need_resched()) return false;