From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760707AbcJRNOR (ORCPT ); Tue, 18 Oct 2016 09:14:17 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:53322 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755217AbcJRNOL (ORCPT ); Tue, 18 Oct 2016 09:14:11 -0400 Date: Tue, 18 Oct 2016 15:14:00 +0200 From: Peter Zijlstra To: Waiman Long Cc: Linus Torvalds , Jason Low , Ding Tianhong , Thomas Gleixner , Will Deacon , Ingo Molnar , Imre Deak , Linux Kernel Mailing List , Davidlohr Bueso , Tim Chen , Terry Rudd , "Paul E. McKenney" , Jason Low , Chris Wilson , Daniel Vetter Subject: Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop Message-ID: <20161018131400.GY3117@twins.programming.kicks-ass.net> References: <20161007145243.361481786@infradead.org> <20161007150211.271490994@infradead.org> <58055BE2.1040908@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <58055BE2.1040908@hpe.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 17, 2016 at 07:16:50PM -0400, Waiman Long wrote: > >+++ b/kernel/locking/mutex.c > >@@ -631,13 +631,21 @@ __mutex_lock_common(struct mutex *lock, > > > > lock_contended(&lock->dep_map, ip); > > > >+ set_task_state(task, state); > > Do we want to set the state here? I am not sure if it is OK to set the task > state without ever calling schedule(). That's entirely fine, note how we'll set it back to RUNNING at the end. > > for (;;) { > >+ /* > >+ * Once we hold wait_lock, we're serialized against > >+ * mutex_unlock() handing the lock off to us, do a trylock > >+ * before testing the error conditions to make sure we pick up > >+ * the handoff. > >+ */ > > if (__mutex_trylock(lock, first)) > >- break; > >+ goto acquired; > > > > /* > >- * got a signal? (This code gets eliminated in the > >- * TASK_UNINTERRUPTIBLE case.) > >+ * Check for signals and wound conditions while holding > >+ * wait_lock. This ensures the lock cancellation is ordered > >+ * against mutex_unlock() and wake-ups do not go missing. > > */ > > if (unlikely(signal_pending_state(state, task))) { > > ret = -EINTR; > >@@ -650,16 +658,27 @@ __mutex_lock_common(struct mutex *lock, > > goto err; > > } > > > >- __set_task_state(task, state); > > spin_unlock_mutex(&lock->wait_lock, flags); > > schedule_preempt_disabled(); > >- spin_lock_mutex(&lock->wait_lock, flags); > > > > if (!first&& __mutex_waiter_is_first(lock,&waiter)) { > > first = true; > > __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); > > } > >+ > >+ set_task_state(task, state); > > I would suggest keep the __set_task_state() above and change > set_task_state(task, state) to set_task_state(task, TASK_RUNNING) to provide > the memory barrier. Then we don't need adding __set_task_state() calls > below. set_task_state(RUNNING) doesn't make sense, ever. See the comment near set_task_state() for the reason it has a barrier. We need it here because when we do that trylock (or optimistic spin) we need to have set the state and done a barrier, otherwise we can miss a wakeup and get stuck. > >+ /* > >+ * Here we order against unlock; we must either see it change > >+ * state back to RUNNING and fall through the next schedule(), > >+ * or we must see its unlock and acquire. > >+ */ > >+ if (__mutex_trylock(lock, first)) > >+ break; > >+ > > I don't think we need a trylock here since we are going to do it at the top > of the loop within wait_lock anyway. The idea was to avoid the wait-time of that lock acquire, also, this is a place-holder for the optimistic spin site for the next patch.