From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755577AbbDJL1y (ORCPT ); Fri, 10 Apr 2015 07:27:54 -0400 Received: from mail-wi0-f179.google.com ([209.85.212.179]:38315 "EHLO mail-wi0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754723AbbDJL1x (ORCPT ); Fri, 10 Apr 2015 07:27:53 -0400 Date: Fri, 10 Apr 2015 13:27:48 +0200 From: Ingo Molnar To: "Paul E. McKenney" Cc: Linus Torvalds , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML Subject: [PATCH] mutex: Improve mutex_spin_on_owner() code generation Message-ID: <20150410112748.GB30477@gmail.com> References: <20150409075311.GA4645@gmail.com> <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150410111427.GA30477@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > although the double unrolled need_resched() check looks silly: > > 4d: 48 8b 80 10 c0 ff ff mov -0x3ff0(%rax),%rax > 54: a8 08 test $0x8,%al > > 75: 48 8b 81 10 c0 ff ff mov -0x3ff0(%rcx),%rax > 7c: a8 08 test $0x8,%al The patch below fixes that and shaves 40 bytes off mutex_spin_on_owner()'s code size. Thanks, Ingo ===================================> >>From 065e46b7398e38f2e4be98c453e797ee511170e2 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Fri, 10 Apr 2015 13:21:24 +0200 Subject: [PATCH] mutex: Improve mutex_spin_on_owner() code generation GCC somewhat stupidly decides that the loop within mutex_spin_on_owner() needs unrolling: 0000000000000030 : 30: 48 3b 37 cmp (%rdi),%rsi 33: 55 push %rbp 34: 48 89 e5 mov %rsp,%rbp 37: 75 4f jne 88 39: 48 8d 56 28 lea 0x28(%rsi),%rdx 3d: 8b 46 28 mov 0x28(%rsi),%eax 40: 85 c0 test %eax,%eax 42: 74 3c je 80 44: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 4b: 00 00 4d: 48 8b 80 10 c0 ff ff mov -0x3ff0(%rax),%rax 54: a8 08 test $0x8,%al 56: 75 28 jne 80 58: 65 48 8b 0c 25 00 00 mov %gs:0x0,%rcx 5f: 00 00 61: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 68: f3 90 pause 6a: 48 3b 37 cmp (%rdi),%rsi 6d: 75 19 jne 88 6f: 8b 02 mov (%rdx),%eax 71: 85 c0 test %eax,%eax 73: 74 0b je 80 75: 48 8b 81 10 c0 ff ff mov -0x3ff0(%rcx),%rax 7c: a8 08 test $0x8,%al 7e: 74 e8 je 68 80: 31 c0 xor %eax,%eax 82: 5d pop %rbp 83: c3 retq 84: 0f 1f 40 00 nopl 0x0(%rax) 88: b8 01 00 00 00 mov $0x1,%eax 8d: 5d pop %rbp 8e: c3 retq The need_resched() check is duplicated: 4d: 48 8b 80 10 c0 ff ff mov -0x3ff0(%rax),%rax 54: a8 08 test $0x8,%al 56: 75 28 jne 80 75: 48 8b 81 10 c0 ff ff mov -0x3ff0(%rcx),%rax 7c: a8 08 test $0x8,%al 7e: 74 e8 je 68 So restructure the loop a bit, to get much tighter code: 0000000000000030 : 30: 55 push %rbp 31: 65 48 8b 14 25 00 00 mov %gs:0x0,%rdx 38: 00 00 3a: 48 89 e5 mov %rsp,%rbp 3d: 48 39 37 cmp %rsi,(%rdi) 40: 75 1e jne 60 42: 8b 46 28 mov 0x28(%rsi),%eax 45: 85 c0 test %eax,%eax 47: 74 0d je 56 49: f3 90 pause 4b: 48 8b 82 10 c0 ff ff mov -0x3ff0(%rdx),%rax 52: a8 08 test $0x8,%al 54: 74 e7 je 3d 56: 31 c0 xor %eax,%eax 58: 5d pop %rbp 59: c3 retq 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 60: b8 01 00 00 00 mov $0x1,%eax 65: 5d pop %rbp 66: c3 retq What changed relative to the previous loop is that cpu_relax() is done before the need_resched() check. This in fact makes sense: only after waiting a bit (or a lot, on virtualized platforms) should we expect 'need_resched' to have changed. Not-Yet-Signed-off-by: Ingo Molnar --- kernel/locking/mutex.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 68c750f4e8e8..45d2445e457a 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -225,9 +225,12 @@ ww_mutex_set_context_slowpath(struct ww_mutex *lock, static noinline bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner) { - int on_cpu; + int owner_on_cpu; + + for (;;) { + if (unlikely(lock->owner != owner)) + return true; - while (lock->owner == owner) { /* * Use the non-faulting copy-user primitive to get the owner->on_cpu * value that works even on CONFIG_DEBUG_PAGEALLOC=y instrumented @@ -251,15 +254,16 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner) * NOTE2: We ignore failed copies, as the next iteration will clean * up after us. This saves an extra branch in the common case. */ - get_kernel(on_cpu, &owner->on_cpu); - - if (!on_cpu || need_resched()) + get_kernel(owner_on_cpu, &owner->on_cpu); + if (unlikely(!owner_on_cpu)) return false; cpu_relax_lowlatency(); - } - return true; + /* Stop spinning if we are to be preempted: */ + if (need_resched()) + return false; + } } /*